Minutes of the Malta WG2 meeting

Session 1: March, 19th 16:00 - 17:30 -- Discussion regarding a book project

Following the book being written within the PARSEME WG1 (see http://typo.uni-konstanz.de/parseme/index.php/2-general/117-book-project-multiword-expressions-insights-from-a-multi-lingual-perspective), we plan to publish a collaborative book gathering contributions on the topic of MWE representation and parsing (i.e., in close relation to WG2).

During the meeting, we discussed the following points:

a. outline of the book

While WG1's book takes a multi-lingual perspective (on MWE encoding), it appeared during the discussion that taking the same perspective here could lead to some balance issues in terms of formalisms or aspects (resources vs parsing techniques).

A possible perspective for the WG2 book could be "challenging examples for grammar implementation (within hand-crafted / automatically acquired grammars) and deep parsing (using symbolic and/or statistical techniques)".

Each chapter could focus on a given type of challenging MWE, describe its encoding within a formal grammar, and its processing (e.g. to retrieve valid syntactic structures at parsing).

For sake of consistency and readability, all chapters would rely on some common guidelines to be defined.

b. publication venue

Regarding where to publish this book, two options were proposed, namely (i) Cambridge Scholar Publishing (http://www.cambridgescholars.com/Linguistics),some WG2 members already published via this editor, and the editing process went smoothly ; and (ii) Language Science Press (http://langsci-press.org/) which is the publisher for WG1's book.

The discussion leaned towards contacting LSP and offering them to publish WG2's book somehow as a sequel of WG1 (i.e. tome 1/2 within the same series).

WG1's book is targetting the "Empirically Oriented Theoretical Morphology and Syntax" series (http://langsci-press.org/catalog/series/eotms). We could choose the same series, and target for instance the "Implemented Grammar" subseries (http://langsci-press.org/catalog/series/eotms-ig).

c. reviewing process

We proposed to follow a somehow similar process as what was done for WG1's book, namely by sending a Call For Contributions (e.g. on the WG2 mailing list).
Each contribution would be peer-reviewed. Selected contributions would be invited for a chapter in the WG2 book.

The reviewing process along with the publication schedule have to be discussed with the LSP editors.

We plan to have more information about this at the next WG2 meeting in Sept. in Iasi, and plan to get the book published by the end of the PARSEME Action (March 2017).

Session 2: March, 20th 11:00 - 12:30 -- "Hands-on' session introducing two platforms for MWE-aware NLP

This session was the first practical session to take place within a WG2 meeting. While it was intended to be a tutorial session, allowing one to get hands on an existing platform for MWE processing, we did not strictly manage to do this. The session was split in two introductory talks, which gave a detailed presentation of each platforms and of the underlying concepts and techniques.

Presentation #1. Deep Learning architecture by Giuseppe Attardi
Presentation #2. MWE toolkit by Carlos Ramisch and Sylvio Cordeiro

Due to time limitations, the MWE toolkit presentation had to be shortened. It will be resumed at the next WG2 meeting in Ia≈üi.

Here are the titles and abstracts of these two talks, along with links to the respective slides.

Presentation #1: Deep Learning and MWEs

Abstract: I will introduce the methods of Deep Learning and their use in NLP tasks. I will focus on a unified deep neural network architecture that can be shared among various tasks. The architecture exploits distributed semantic word representations that can be learned by means of unsupervised algorithms from unannotated plain texts. We will present a toolkit, based on this architecture, which includes tools such as taggers (POS, NER, SRL), a dependency parser as well as tools for creating the word embeddings. I will describe how to use these tools in practice. I would also present the results of some experiments that I am carrying out, that use word embeddings for identifying MWE.

Homepage of the software:
http://www.di.unipi.it/~attardi/software.html

The slides from the presentation are available here.

Presentation #2: Getting our hands dirty with the mwetoolkit

Abstract: This tutorial is an introduction to corpus-based MWE extraction using the mwetoolkit. It is made for linguists and computer scientists who want to explore corpora and automatically extract relevant expressions from them. The tutorial will include many practical exercises on a small POS-tagged corpus of English transcribed talks. We will define interesting expression patterns using multi-level regular expressions in XML, then apply them on the corpus, calculate word and expression frequencies, generate features and evaluate the results.

Further reading: http://aclweb.org/anthology-new/W/W12/W12-3311.pdf
Mwetoolkit website: http://mwetoolkit.sourceforge.net

The slides from the talk are available here.