1. Introduction of the audience
Backgrounds and interests of the participants:
- dependency parsing, constituent-based parsing
- mwe recognition to facilitate parsing
- information extraction
- acronym acquisition
- machine translation
- dictionaries acquisition
- nlp applications wrt linguistic resources
- relation extraction from texts
- language acquisition
2. Individual contributions
Yannick: MWE-aware lexical selection (aka supertagging) for TAG
drawback: limited MWE support (parses ranking)
Giuseppe: MWE-aware tools
* Dependency parsing (transition-based shift-reduce parsing)
http://desr.sourceforge.net (trained on 28 languages)
(robust, 100 of sentences per second, no grammar needed, only annotated treebank)
point: annotation easy compared with deep grammar development
* Word embedding (done offline) for word labelling (including POS-tagging, MWE recognition, word clustering)
point: not relying on feature definition and refinement
quality measurement ? not per se but via tagging/parsing improvement
avantage: POS can be replaced with clusters for improved parsing (what about the number of classes vs POS-tags ?)
Kayla: acronym acquisition and disambiguation
how to deal with non-adjacent expansions ?
growing number of acronyms -> needs for automatic reproducible acquisition methods
link between acronyms recognition and coreference ?
formation patterns ? (half of hebrew acronyms use additional letters, not only leading ones) -> language dependent
Proposal for next WG2 meeting:
* psycholinguistics, semantics, parallel semantic corpora (cf Peter, Norway-Iceland, Frasar.net)
* summaries to be put on line
* select common topics/issues:
e.g. information extraction from parse outputs (cf idiomatic meaning)
(several people addressing a question)
* structuration of the meeting prepared in advance