This page PARSEME deliverables, as defined in the Memorandum of Understanding.

  1. Contrastive analysis of the linguistic properties of MWEs in different European languages.
  2. Proposal of a common design for lexicons including both valence data and MWE data.
    • Publication on contrastive analysis of the design of valence MWE dictionaries in Czech and Polish:
      Adam Przepiórkowski, Jan Hajič, Elżbieta Hajnicz, and Zdeňka Urešová. Phraseology in two Slavic valency dictionaries: Limitations and perspectives. International Journal of Lexicography, 30(1):1–38, 2017
    • WG1 workshop on lexical encoding of MWEs, based on the DuELME formalism meant to be theory- and grammar-independent, and interoperable with valence-aware grammars
  3. Lexical databases: possibly interoperable parsing-oriented MWE lexicons and valence dictionaries in several European languages.
  4. Extensions of existing corpora and treebanks in several languages with MWE annotation levels.
    • Annotation guidelines for 18 languages in the PARSEME shared task on automatic identification of verbal MWEs
    • Course on MWEs and the Praque Dependency Treebank
    • PARSEME-FR - a PARSEME French spin-off project with annotating MWEs as one of the main objectives
    • Papers on projecting MWE resources on treebanks
  5. Extensions of existing grammars for several European languages with rules dedicated to MWEs.
    • Course on Multi-word Expressions in HPSG
    • Posters and papers on integrating MWEs in symbolic grammars (in English, Greek, Hebrew and Polish)
  6. Definitions of abstract models (e.g. meta-grammars) of MWEs’ properties that would: (i) capture linguistic richness of MWEs independently of particular grammatical frameworks, (ii) help reduce the cost of resource development, (iii) adapt to different languages studied.
    • 2 tutorials on XMG, a meta-grammar framework for efficient development of lexicalized grammars with MWEs
    • Papers and posters on MWE encoding in XMG
    • A tutorial on integrating MWEs in FRMG, a French Meta-Grammar
    • WG1 workshop on lexical encoding of MWEs, based on the DuELME formalism meant to be theory- and grammar-independent, and interoperable with valence-aware grammars
  7. Recommendations of best practices for MWE representation and treatment in parsing within different theoretical frameworks.
    • Papers on joint parsing and MWE identification
    • Tutorials on MWEs in FRMG and in the Grammatical Framework
    • Course on "Dependency grammar, dependency parsing and MWEs"
    • WG2 book "Representation and Parsing of Multiword Expressions"
  8. Extension of hybrid (knowledge-based and data-driven) methods for parsing MWEs.
    • WG3 survey on hybrid processing of MWEs
    • Papers on a novel architecture of joint dependency parsing and MWE identification
    • Papers on promoting MWEs in TAG parsing
  9. Annotation guidelines for the representation of MWEs in treebanks.
    • WG4 survey on annotating MWEs in treebanks
    • 2 papers describing the survey and paving the way towards guidelines
  10. A common publishing platform gathering initiatives in the field of MWEs and parsing.
    • This website
    • Publicly available Google table from the WG1 MWE lexicon survey
    • Publicly available Wiki table from the WG4 survey on annotating MWEs in treebanks
  11. Scientific publications in established conferences and journals in various domains - see the pages dedicated to papers and proceedings.

his page contains links to PARSEME outcomes other than those listed in dedicated pages.

STSMs:

PARSEME funded 39 Short Term Scientific Missions for 35 reseachers and a total of 49 months with the following distribution:

  • early-stage researchers: 30 STSMs (77%); senior researchers: 9 STSMs (23%)
  • male: 20 STSMs (51%); female: 19 STSMs (49%)
  • 25 countries were concerned in total (either as a sending or as a hosting country)
    • STSMs coming from an inclusiveness country: 14 (36%); STSMs coming from a non-inclusiveness country: 25 (64%)
    • STSMs going to an inclusiveness country: 8 (21%); STSMs going to a non-inclusiveness country: 31 (79%)
  • Average STSM duration: 29 days
  • All reports are available online

Members' lists:

PARSEME gathers members of 2 categories:

  • Management Committee members and substitutes were nominated by the participating countries as their official representative. The MC list is maintained by COST.
  • Working Group members are admitted according to PARSEME internal rules. The WG members' list, containing profiles and contacts of the members, is one of our networking instruments.

Spin-off projects:

  • Five PARSEME spin-off projects received national funding in the Czech Republic, France, Lithuania, Poland and Slovenia.

Success stories:

  • PARSEME was shortlisted by COST for a presentation at the European Conference for Science Journalists (Copenhagen, 26-30 June), the largest gathering of science journalists in Europe in 2017
  • Glorianna Jagfeld's bachelor thesis "Towards a Better Semantic Role Dimension of the success Labeling of Complex Predicates", supervised by Lonneke van der Plas, has received the German national GSCL prize for the best ESR support Bachelor thesis in Computational Linguistics as well as the local Infos prize.

Theses (to complete):

Negative Polarity MWEs (NPMWEs) are a theoretically and practically challenging class since their obligatory licensing environments can be abstract grammatical, semantic, and even pragmatic categories; this makes them difficult to identify and classify. Such special lexical units have already been researched within the PARSEME community for Polish, German, and Romanian, and we intend to share and discuss our methodologies in order to:

  •  Document and classify NPMWEs for multiple languages
  • Verify the effectiveness of the tests that we already developed for individual languages and research whether other tests should be used after comparing NPMWEs from different languages.
  • Develop a set of tests that will prove efficient for classifying and identifying NPMWEs across languages.
  • Research the distributional properties of  NPMWEs in different languages.
  • Develop a multilingual resource (such as an electronic dictionary) of negative polarity items. 

SIGLEX-MWE section and PARSEME are co-organizing the annual Multiword Expressions Workshop on 4 April 2017. It will be co-located with the EACL 2017 conference. It includes a special track dedicated the PARSEME shared task on automatic identification of MWEs. 


PARSEME grants

PARSEME will fund travel and stay for 33 workshop participants from the PARSEME member countries. Applicants should fill in the application form by 15 February 2017. The selection of applicants entitled to reimbursement will be done by the PARSEME Steering Committee. Priority is given to:

  • workshop and shared task organizers, technical experts and language group leaders,
  • shared task language leaders,
  • authors of the best systems in the shared task,
  • presenters of papers/posters,
  • shared task annotators,
  • early-stage researchers,
  • PARSEME membres.

The reimbursement rates:

  • Hotel: 120 EUR per night (flat rate). The number of the reimbursed nights is equal to the number attended worhshop days plus 1 (in case the participant arrives earlier than her/his first attended day and leaves later than his last attended day). An attendance list must be signed each day of presence at the workshop.
  • Meals: 20 EUR per meal (flat rate).
  • Travel: real costs limited to 1200 € (economy class air tickets, train tickets, local transport, etc.).
  • Workshop admission fees are not eligible for reimbursement.

Detailed reimbursement rules are defined in the COST Vademecum, pp. 19-23, section 4. The applicants selected for funding will receive a formal invitation via the e-COST system (which they should accept before their travel). They should cover their travel and stay in advance and will be reimbursed on return.

Important dates:

  • 16 22 January, 2017: Submission deadline for the main track long & short papers
  • 5 February: Submission deadline for shared task system description papers
  • 11 February: Notification of acceptance for the main track papers
  • 12 February: Notification of acceptance for the shared task papers
  • 15 February: deadline for applications for funding
  • 20 February: Camera-ready papers due (main track and shared task)
  • 1 March: notification to applicants about funding
  • 4 April, 2017: MWE 2017 Workshop

 

This page describes the format, called parseme-tsv-pos format, of the input corpora to be uploaded to the FLAT annotation platform in the PARSEME shared task on on automatic detection of verbal MWEs. See a sample file for illustration.

The parseme-tsv-pos format is a five-column format derived from the parseme-tsv format in the following way:

  • The fourth column may or may not contain VMWE annotations (in the latter case, the whole column contains underscores '_').
  • The fifth column contains the part-of-speech tag for the current token, or an underscore ('_') if no tag is provided. No specific POS tagset is recommended, and the POS tags can take any form.
  • No comment lines are admitted.

Examples:

1        Delegates _ _
2 are 1:LVC   V
3 in 1 _
4 little _
5 doubt 1 _
6 that _
7 the _
8 shadow 2:ID _
9 cast 2 Vpp
10 over _
11 the _
12 city _
13 by _
14 the _
15 attacks _
16 will V
17 enhance VInf
18 the _
19 chances _
20 of _
21 agreement nsp  _
22 . _
         
1 Questioning Vger
2 colonial _
3 boundaries _
4 would V
5 open _ Vinf
6 a _
7 dangerous _
8 Pandora nsp    _ _
9 ' nsp _ _
10 s _ _
11 box nsp _ _
12 .  _ _

 

Files in this format are useful in the following cases:

  • part-of-speech tags are available for the corpora; we recommend in this case to keep only the verbal POS tags (including gerunds and participles), which will then display in FLAT above the verbal tokens; this may greatly speed up the manual annotations since head verbs are automatically underlined in the FLAT interface; annotators should, however, be aware of the bias, especially in the POS tags are not gold standard tags,
  • automatic VMWE pre-annotations are available; and they need a manual validation in FLAT,
  • some annotators work off-line in Excel-like spreadsheets.