Corpus-based learning of OT constraint rankings for large-scale LFG grammars

Martin Forst, Jonas Kuhn and Christian Rohrer

Abstract

Proceedings of LFG05; CSLI Publications On-line

We discuss a two-stage disambiguation technique for linguistically precise broad-coverage grammars: the pre-filter of the first stage is triggered by linguistic configurations (optimality marks) specified by the grammar writer; the second stage is a log-linear probability model trained on corpus data. This set-up is used in the Parallel Grammar (ParGram) project, developing Lexical Functional Grammars for various languages. The present paper is the first study exploring how the pre-filter can be empirically tuned by learning a relative ranking of the optimality marks from corpus data, identifying problematic marks and relaxing the filter in various ways.