This page gathers the outcomes of and facts about the PARSEME shared task on automatic identification of verbal MWEs (VMWEs).

Basic facts:

  • The PARSEME shared task on automatic identification of verbal multiword expressions (VMWEs) is based on a considerable collective effort undertaken by the European PARSEME COST action.
  • The shared task editions 1.0 are 1.1 are now complete.
  • In edition 1.0:
    • 18 languages released their training and test corpora
    • 7 systems participated, 5 of them were multilingual, all 18 languages were covered - see the results
    • The 13th MWE workshop on 4 April 2017 in Valencia, Spain was the culminating event. It featured the shared task presentation paper and 6 system presentation posters (cf. the proceedings).
  • In edition 1.1:
    • 20 languages released their training and test corpora
    • 17 systems participated, all of them were multilingual, all 19 languages were covered - see the results
    • The LAW-MWE-CxG workshop on 25-26 August 2018 in Santa Fe, USA was the culminating event. It featured the shared task presentation paper and 8 system presentation posters (cf. the proceedings).

Outcomes and infrastructure:

  • Universal guidelines, with examples in many languages and room for language-specific specifications.
  • The final corpus released via the CLARIN/LINDAT insfrastructure:
    • edition 1.0: 5.5 million tokens and 60,000 VMWE annotations in 18 languages, distributed under different versions of the Creative Commons license.
    • edition 1.1: 6 million tokens, 79,000 VMWE annotations in 19 languages, distributed under different versions of the Creative Commons license.
  • Project management, with a structure based on language groups and roles (organizers, language group leaders, language leaders, annotators, etc.).
  • Customizable annotation platform FLAT
  • Dedicated tools to verify coherence and silence
  • File formats (including parsemetsv and cupt), validators, converters and evaluation tools
  • Communication tools: mailing lists, git issue tracker, websites
  • Data repositories (gitlab)

Future work:

  • PARSEME gathered a large group of highly motivated contributors.
  • After the end of PARSEME action late April 2017, future activities of this community, extended to a larger international context, have been be coordinated by the SIGLEX-MWE section. New members are welcome to the section. They may subscribe by following the 2 steps:
    • joining SIGLEX by via the web form (the MWE section should be selected)
    • joining the SIGLEX-MWE section by subscribing to the MWE mailing list
  • We plan edition 1.2 of the shared task in 2020, dedicted to weakly supervised VMWE identification.