Relational-Realizational Syntax: An Architecture for Specifying and Learning Morphosyntactic Descriptions

Reut Tsarfaty

Abstract

This paper presents a novel architecture for specifying rich morphosyntactic representations and learning the associated grammars from annotated data. The key idea underlying the architecture is the application of the traditional notion of a "paradigm" to the syntactic domain. N-place predicates associated with paradigm cells are viewed as relational networks that are realized recursively by combining and ordering cells from other paradigms. The complete morphosyntactic representation of a sentence is then viewed as a nested integrated structure interleaving function and form by means of realization rules. This architecture, called Relational-Realizational, has a simple instantiation as a generative probabilistic model of which parameters can be statistically learned from treebank data. An application of this model to Hebrew allows for accurate description of word-order and argument marking patterns familiar from Semitic traditional grammars. The associated treebank grammar can be used for statistical parsing and is shown to improve state-of-the-art parsing results for Hebrew. The availability of a simple, formal, robust, implementable and statistically interpretable working model opens new horizons in computational linguistics --- at least in principle, we should now be able to quantify typological trends which have so far been stated informally or only tacitly reflected in corpus statistics.

Link to pdf of paper