3rd WG2 meeting parallel session "Parsing MWEs using pre/post processing"

1. Introduction of the audience

Backgrounds and interests of the participants:

dependency parsing, constituent-based parsing
mwe recognition to facilitate parsing
information extraction
acronym acquisition
machine translation
dictionaries acquisition
nlp applications wrt linguistic resources
relation extraction from texts
language acquisition

2. Individual contributions

Yannick: MWE-aware lexical selection (aka supertagging) for TAG
drawback: limited MWE support (parses ranking)

Giuseppe: MWE-aware tools

* Dependency parsing (transition-based shift-reduce parsing)
   http://desr.sourceforge.net (trained on 28 languages)
   (robust, 100 of sentences per second, no grammar needed, only annotated treebank)
   point: annotation easy compared with deep grammar development
* Word embedding (done offline) for word labelling (including POS-tagging, MWE recognition, word clustering)
   point: not relying on feature definition and refinement
   quality measurement ? not per se but via tagging/parsing improvement
   avantage: POS can be replaced with clusters for improved parsing (what about the number of classes vs POS-tags ?)

Kayla: acronym acquisition and disambiguation
   how to deal with non-adjacent expansions ?
   growing number of acronyms -> needs for automatic reproducible acquisition methods
   link between acronyms recognition and coreference ?
   formation patterns ? (half of hebrew acronyms use additional letters, not only leading ones) -> language dependent

Proposal for next WG2 meeting:

* psycholinguistics, semantics, parallel semantic corpora (cf Peter, Norway-Iceland, Frasar.net)
* summaries to be put on line
* select common topics/issues:
e.g. information extraction from parse outputs (cf idiomatic meaning)
(several people addressing a question)
* structuration of the meeting prepared in advance