Abstract
This paper describes a method for efficiently computing the statistics needed for parsing and learning Stochastic Lexical-Functional Grammars (SLFGs) from packed parse representations of the kind produced by the XLE parsing system.
SLFG extends LFG by defining a probability distribution over the set of well-formed structures generated by an LFG (Johnson 1999). This permits an SLFG to identify one analysis as being more likely than another, which is useful for disambiguating ambiguous sentences. Formally, an SLFG can be seen as a generalization of an OT grammar in which constraints are given real-valued weights (rather than being merely rank-ordered). SLFGs can also be used for other tasks, such as identifying the most likely word string with a given f-structure, or for identifying the most likely translation of a given sentence.
An SLFG consists of an LFG and a set of properties, each of which is associated with a real-valued weight. Properties identify the characteristics of analyses that distinguish likely from unlikely analyses; the weight (which may be any positive or negative real number) determines whether the presence of this property makes the analyses more or less likely, and by how much. For example, an SLFG may contain the properties ADJUNCT ATTACHMENT and ARGUMENT ATTACHMENT, and if the former has a lower weight than the latter, argument attachment will be judged more likely (all else equal).
A large-scale broad-coverage SLFG may contain many thousands of properties (e.g., associated with the different possible subcategorization frames of each lexical item), so identifying property weights manually is not practical. Luckily, standard statistical methods can be adapted to estimate optimal property weights given a corpus of training analyses.
However, when working with a large, broad-coverage LFG, the number of possible analyses may be very large, and merely enumerating all of the different parses of sentence (in order to find the most likely one) becomes very time-consuming. Moreover, the statistical method for finding optimal property weights given by Johnson et. al. requires repeated enumeration of the parses of each sentence in the training corpus (it uses a numerical method to adjust the property weights to make the corpus as likely as possible), so weight estimation can be very slow indeed.
The problem is exacerbated by the trend to redescribe using the properties of the "soft'' stochastic component linguistic phenomena that are traditionally described in terms of "hard'' constraints of LFG. For example, in order to achieve broad coverage an SLFG might permit every verb to appear in every subcategorization frame (rather than prescribing a particular set of frames to each verb), and could use lexicalized properties to capture subcategorization preferences. Unfortunately, this redescription greatly increases the number of analyses generated by the underlying LFG, and so also makes both the parsing and estimation processes extremely slow indeed.
Earlier work by Maxwell (1995) provides a dynamic programming parsing algorithm for (non-stochastic) LFG. In the worst case the Maxwell and Kaplan parsing algorithm can require time exponential in both grammar and sentence length, but in many practical situations it produces a compact "packed'' representation of the set of parses of the sentence. This paper proposes a dynamic algorithm for computing the statistics needed for parsing and estimation of SLFGs directly from the Maxwell and Kaplan packed representations that does not require each parse to be separately enumerated. This algorithm should make both parsing and estimation of large, broad-coverage SLFGs feasible.
The dynamic programming SLFG algorithms are too complicated to describe in detail in this abstract. However, they rely crucially on the contexted representations used in Maxwell and Kaplan's algorithm. In a contexted representation, analyses are treated as a set of pieces of structure. Each piece of an analysis is associated with a context that specifies which analyses that piece appears in (thus a piece common to many analyses need only be defined once).
The algorithms described here require that SLFG properties be local in a certain sense relative to the pieces of structure of contexted representations. If the SLFG properties have this locality (this can usually be achieved by rewriting the grammar), then the parsing and estimation statistics for set of all possible analyses can be determined by combining the corresponding statistics for each of the pieces of structure. Just as with Maxwell and Kaplan's algorithm, the SLFG dynamic programming algorithms are worst-case exponential, but often polynomial.