Improving Treebank-Based Automatic LFG Induction for Spanish

Grzegorz Chrupala and Josef van Genabith

Abstract

Proceedings of LFG06; CSLI Publications On-line

We describe several improvements to the method of treebank-based LFG induction for Spanish from the Cast3LB treebank [10]. We discuss the different categories of problems encountered and present the solutions adopted. Some of the problems involve a simple adoption of existing linguistic analyses, as in our treatment of clitic doubling and null subjects. In other cases there is no standard LFG account for the phenomenon we wish to model and we adopt a compromise, conservative solution. This is exempli?ed by our treatment of Spanish periphrastic constructions. In yet another case, the less con?gurational nature of Spanish means that the LFG annotation algorithm has to rely mostly on Cast3LB function tags, and consequently a reliable method of adding those tags to parse trees had to be developed. This method achieves over 6% improvement over the baseline for the Cast3LB-function-tag assignment task, and over 3% improvement over the baseline for LFG f-structure construction from function-tag-enriched trees.