Shared task

This page gathers the outcomes of and facts about the PARSEME shared task on automatic identification of verbal MWEs (VMWEs).

Basic facts:

The PARSEME shared task on automatic identification of verbal multiword expressions (VMWEs) is based on a considerable collective effort undertaken by the European PARSEME COST action.
The shared task editions 1.0 are 1.1 are now complete.
In edition 1.0:
- 18 languages released their training and test corpora
- 7 systems participated, 5 of them were multilingual, all 18 languages were covered - see the results
- The 13th MWE workshop on 4 April 2017 in Valencia, Spain was the culminating event. It featured the shared task presentation paper and 6 system presentation posters (cf. the proceedings).
In edition 1.1:
- 20 languages released their training and test corpora
- 17 systems participated, all of them were multilingual, all 19 languages were covered - see the results
- The LAW-MWE-CxG workshop on 25-26 August 2018 in Santa Fe, USA was the culminating event. It featured the shared task presentation paper and 8 system presentation posters (cf. the proceedings).

Outcomes and infrastructure:

Universal guidelines, with examples in many languages and room for language-specific specifications.
The final corpus released via the CLARIN/LINDAT insfrastructure:
- edition 1.0: 5.5 million tokens and 60,000 VMWE annotations in 18 languages, distributed under different versions of the Creative Commons license.
- edition 1.1: 6 million tokens, 79,000 VMWE annotations in 19 languages, distributed under different versions of the Creative Commons license.
Project management, with a structure based on language groups and roles (organizers, language group leaders, language leaders, annotators, etc.).
Customizable annotation platform FLAT
Dedicated tools to verify coherence and silence
File formats (including parsemetsv and cupt), validators, converters and evaluation tools
Communication tools: mailing lists, git issue tracker, websites
Data repositories (gitlab)

Future work:

PARSEME gathered a large group of highly motivated contributors.
After the end of PARSEME action late April 2017, future activities of this community, extended to a larger international context, have been be coordinated by the SIGLEX-MWE section. New members are welcome to the section. They may subscribe by following the 2 steps:
- joining SIGLEX by via the web form (the MWE section should be selected)
- joining the SIGLEX-MWE section by subscribing to the MWE mailing list
We plan edition 1.2 of the shared task in 2020, dedicted to weakly supervised VMWE identification.