This page gathers the outcomes of and facts about the PARSEME shared task on automatic identification of verbal MWEs (VMWEs).
Basic facts:
- The PARSEME shared task on automatic identification of verbal multiword expressions (VMWEs) is based on a considerable collective effort undertaken by the European PARSEME COST action.
- The shared task editions 1.0 are 1.1 are now complete.
- In edition 1.0:
-
- 18 languages released their training and test corpora
- 7 systems participated, 5 of them were multilingual, all 18 languages were covered - see the results
- The 13th MWE workshop on 4 April 2017 in Valencia, Spain was the culminating event. It featured the shared task presentation paper and 6 system presentation posters (cf. the proceedings).
- In edition 1.1:
-
- 20 languages released their training and test corpora
- 17 systems participated, all of them were multilingual, all 19 languages were covered - see the results
- The LAW-MWE-CxG workshop on 25-26 August 2018 in Santa Fe, USA was the culminating event. It featured the shared task presentation paper and 8 system presentation posters (cf. the proceedings).
Outcomes and infrastructure:
- Universal guidelines, with examples in many languages and room for language-specific specifications.
- The final corpus released via the CLARIN/LINDAT insfrastructure:
-
- edition 1.0: 5.5 million tokens and 60,000 VMWE annotations in 18 languages, distributed under different versions of the Creative Commons license.
- edition 1.1: 6 million tokens, 79,000 VMWE annotations in 19 languages, distributed under different versions of the Creative Commons license.
- Project management, with a structure based on language groups and roles (organizers, language group leaders, language leaders, annotators, etc.).
- Customizable annotation platform FLAT
- Dedicated tools to verify coherence and silence
- File formats (including parsemetsv and cupt), validators, converters and evaluation tools
- Communication tools: mailing lists, git issue tracker, websites
- Data repositories (gitlab)
Future work:
- PARSEME gathered a large group of highly motivated contributors.
- After the end of PARSEME action late April 2017, future activities of this community, extended to a larger international context, have been be coordinated by the SIGLEX-MWE section. New members are welcome to the section. They may subscribe by following the 2 steps:
- joining SIGLEX by via the web form (the MWE section should be selected)
- joining the SIGLEX-MWE section by subscribing to the MWE mailing list
- We plan edition 1.2 of the shared task in 2020, dedicted to weakly supervised VMWE identification.