This page gathers the outcomes of and facts about the PARSEME shared task on automatic identification of verbal MWEs (VMWEs).

Basic facts:

  • The PARSEME shared task on automatic identification of verbal multiword expressions (VMWEs) is based on a considerable collective effort undertaken by the European PARSEME COST action.
  • The shared task edition 1.0 is now complete.
  • 18 languages released their training and test corpora.
  • 7 systems participated, 5 of them were multilingual, all 18 languages were covered - see the results.
  • The 13th MWE workshop on 4 April 2017 in Valencia, Spain was the culminating event. It featured the shared task presentation paper and 6 system presentation posters (cf. the proceedings).

Outcomes and infrastructure:

  • Universal guidelines, with examples in many languages and room for language-specific specifications.
  • The final corpus of 5.5 million tokens and 60,000 VMWE annotations in 18 languages, distributed under different versions of the Creative Commons license.
  • Project management, with a structure based on language groups and roles (organizers, language group leaders, language leaders, annotators, etc.).
  • Customizable annotation platform FLAT
  • Dedicated tools to verify coherence and silence
  • File formats (including parsemetsv), validators, converters and evaluation tools
  • Communication tools: mailing lists, git issue tracker, websites
  • Data repositories (gitlab)

Future work:

  • PARSEME gathered a large group of highly motivated contributors.
  • After the end of PARSEME action late April 2017, future activities of this community, extended to a larger international context, will be coordinated by the SIGLEX-MWE section. New members are welcome to the section. They may join by subscribing to the section's mailing list.
  • We plan editions 1.1 and 2.0 of the shared task in 2018 and 2019, with new languages, enhanced guidelines and corpora, as well as an extended scope.