PARSEME follow-up proposal === Proposers' names and affiliations Alexandr Rosen, Eduard Bejček, Pavel Straňák Institute of Theoretical and Computational Linguistics and Faculty of Arts Charles University Prague, Czech Republic ==== Topics: Translation, Parallel Data and MWEs The European research network built within the PARSEME project is an optimal environment not only for multi-lingual and cross-lingual MWE research in general, including common guidelines, taxonomy and methodologies. It also provides a rare opportunity to follow up on its results, both theoretical and application-oriented, by designing, adapting and building tools and resources targetting MWEs in a multi-lingual setting. The cross-lingual MWE taxonomy and treebank annotation scheme can support tasks such as annotating MWEs in both monolingual and multilingual corpora, projecting MWE annotation to parallel texts, developing a multilingual lexical database of MWEs, and to use the results in a machine or machine-aided translation system. Additionally, the project could focus on multilingual terminology, perhaps in the medical domain. Possible goals: • To build a set of parallel corpora – for pairs or sets of several languages; MWEs will be hand-annotated in parts of each of the parallel corpora • To align parallel texts by words and to compare MWE annotation and their mutual linking: number of words, range, overlapping, ... • To project MWE annotation from a source language with MWE annotation to a target language without it • To predict MWEs in parallel texts without MWE annotation, based on word-to-word alignments • To focus in more detail on terminology in a specific domain; to build parallel aligned corpora of multi-word terms, linking each term to its definition in an encyclopaedia • To improve translation using data and knowledge from the tasks above The name could refer to PARSEME, for example TRANSEME (however, this word already exists: "comprehensible textual unit for translation"). Another possibility for a grant proposal could be connected to the PARSEME Shared Task. We can create another set of annotation guidelines for nominal expressions or for named entities. Together with the results of WG4 create some kind of standard for annotation. Collaborate with Universal Dependencies. Organize shared task(s).