PARSEME WG1 hands-on workshop on lexical encoding of MWEs

Event title: PARSEME WG1 hands-on workshop on lexical encoding of MWEs

Location: Ferdinand Hall, Alexandru Ioan Cruza University, Ia»ôi, Romania

Dates: 21-22 September 2015 (co-located with PARSEME's 5th general meeting on 23-24 September)

Hosting Institution: Alexandru Ioan Cuza University, Ia»ôi, Romania

Invited speaker: Prof. Jan Odijk, Universiteit Utrecht, the Netherlands

introduction to the LMF framework and an presentation of DUELME, a Dutch MWE (proto-)lexicon in LMF-format

Workshop Organizers: Gyri Smørdal Losnegaard, Carla Parra Escartín, Manfred Sailer

NEW! Several other photos from the workshop are also available.

Preliminary program

Monday, 21 September		Tuesday, 22 September
		9:00-10:30	Discussion 1: Debugging of practical problems I: find solutions to simpler problems
		10:30-10:50	Break
		10:50-12:50	Discussion 2: Debugging of practical problems II: find solutions to more advanced problems
13:00-14:00	Introduction by the workshop leader, short presentation of the participants (max. 3 min. each)	12:50-14:00	Lunch
14:00-14:45	Lecture 1: Jan Odijk, DUELME	14:00-16:00	Practical session 2: Documentation - production of short videos on how to encode particular MWEs.
14:45-15:15	Break
15:15-16:00	Lecture 2: Jan Odijk, DUELME in LMF
16:00-16:15	Break	16:00-16:20	Break
16:15-18:00	Practical sessions 1: Creation of lexicon entries for MWEs in the data set; identification of practical problems	16:20-17:20	Discussion 3: Indentification and documentation of the main challenges from MWE encoding in general and the LMF standard in particular.
		17:20-17:30	Break
		17:30-18:30	Discussion and preparation for a joint publication
		17:30-18:30	Discussion and preparation for a joint publication

Rationale: The idea behind the workshop is to work hands-on with the encoding of linguistic (and other) properties of MWEs. Evaluation of frameworks for lexical encoding is a prioritized task in PARSEME, and the main objectives of this workshop are to make recommendations for the development of MWE lexicons and databases and to work towards the development of best practices.

A framework for MWE encoding (i.e., a MWE lexicon/database model) should ideally meet at least the following requirements:

support rich linguistic descriptions
support metadata specifications
be language independent
be theory neutral
be NLP compatible
be reusable and interoperable

The Lexical Markup Framework (LMF) will be used for lexical encoding. LMF is a standardized framework for the development of computational dictionaries and is recommended as a standard by large international language resource infrastructure initiatives such as CLARIN and META-NET. It is based on standard formalisms for data description and modeling, and adheres to the above requirements.

Modalities: A MWE data set will be created in advance, with both straightforward examples and more difficult cases from all languages represented at the workshop. During hands-on sessions, participants will try to create lexical entries in LMF format for the MWEs in the data set. The lexical encoding of the more straightforward cases will be recorded as short "encoding do-it-yourself videos", while general challenges and challenging cases will be discussed in a problem-solving session. Proposed solutions to the more difficult cases will also be recorded and made available as an e-learning resource. Participants will be encouraged to plan and write a publication summing up the experiences from the workshop.

Participants: about 15 experts of various languages (ideally 1 per language); computational linguists, computational lexicographers (PARSEME members have a priority)

Important dates:

~~1 June 2015: registration deadline~~
~~15 June 2015: notification of admission~~
~~9 July 2015: notify wokshop organizers about special topic(s) of interest and data sets (if you have your own data and want to use this for encoding)~~
~~31 July 2015: feedback from particpants regarding reading materials and data (provide relevant new MWE examples for encoding if necessary)~~
~~7 September~~ ~~30 August 2015: submission of workshop input data (encoding examples, challenges, possible solutions to problems)~~

Registration: if you are interested in attending the workshop, please fill in the registration form. The workshop organizers will select those participants who will be entitled to reimbursement of their travel and stay.

Accommodation and reimbursement: see the webpage of PARSEME's 5th general meeting

A reading list and a folder with the relevant documents was sent out to the workshop participants on July 3rd. If you are attending the workshop but for some reason did not receive this email (e.g. the attachment was too large), please contact the workshop organizers as soon as possible and they will find a different way of providing these materials.