This page gathers the outcomes of and facts about the PARSEME shared task on automatic identification of verbal MWEs (VMWEs).
- The PARSEME shared task on automatic identification of verbal multiword expressions (VMWEs) is based on a considerable collective effort undertaken by the European PARSEME COST action.
- The shared task edition 1.0 is now complete.
- 18 languages released their training and test corpora.
- 7 systems participated, 5 of them were multilingual, all 18 languages were covered - see the results.
- The 13th MWE workshop on 4 April 2017 in Valencia, Spain was the culminating event. It featured the shared task presentation paper and 6 system presentation posters (cf. the proceedings).
Outcomes and infrastructure:
- Universal guidelines, with examples in many languages and room for language-specific specifications.
- The final corpus of 5.5 million tokens and 60,000 VMWE annotations in 18 languages, distributed under different versions of the Creative Commons license.
- Project management, with a structure based on language groups and roles (organizers, language group leaders, language leaders, annotators, etc.).
- Customizable annotation platform FLAT
- Dedicated tools to verify coherence and silence
- File formats (including parsemetsv), validators, converters and evaluation tools
- Communication tools: mailing lists, git issue tracker, websites
- Data repositories (gitlab)
- PARSEME gathered a large group of highly motivated contributors.
- After the end of PARSEME action late April 2017, future activities of this community, extended to a larger international context, will be coordinated by the SIGLEX-MWE section. New members are welcome to the section. They may join by subscribing to the section's mailing list.
- We plan editions 1.1 and 2.0 of the shared task in 2018 and 2019, with new languages, enhanced guidelines and corpora, as well as an extended scope.