Constructing a Parsed Corpus with a Large LFG Grammar

Victoria Rosén, Paul Meurer, and Koenraad de Smedt

Abstract

Proceedings of LFG05; CSLI Publications On-line

The TREPIL project (Norwegian treebank pilot project 2004-2008) is aimed at developing and testing methods for the construction of a Norwegian parsed corpus. Annotation of c-structures, f-structures and mrs-structures is based on automatic parsing with human validation and disambiguation. Parsing is done with a large LFG grammar and the XLE parser. We propose a method for efficient disambiguation based on discriminants and we have implemented a set of computational tools for this purpose.