WG2 Meeting, September 24th, Ia»ôi, Romania

Session 1: 9:00-10:30 AM, Ferdinand Hall

Title: Getting our hands dirty with the mwetoolkit (Part 2)

Authors: Carlos Ramisch and Silvio Cordeiro (LIF, Aix-Marseille University, France)

Abstract: This tutorial is an introduction to corpus-based MWE extraction using the mwetoolkit. It is made for linguists and computer scientists who want to explore corpora and automatically extract relevant expressions from them. The tutorial will include many practical exercises on a small POS-tagged corpus of English transcribed talks. We will define interesting expression patterns using multi-level regular expressions in XML, then apply them on the corpus, calculate word and expression frequencies, generate features and evaluate the results. The only prerequisite is to have access to a laptop with a command-line interpreter (Windows Cygwin or Linux/Mac terminal).

Those of you who want to use the mwetoolkit during the tutorial are advised to pre-install on their computer, before the tutorial:
+ Python 2.7 - already installed in most Linux distributions
+ The mwetoolkit's latest stable release:
     - Instructions for Windows, Mac and Linux at http://mwetoolkit.sourceforge.net/PHITE.php?sitesig=MWE&page=MWE_010_Install

Further reading: http://aclweb.org/anthology-new/W/W12/W12-3311.pdf
Mwetoolkit website: http://mwetoolkit.sourceforge.net

NEW: the material used for the tutorial is available via this link.

Session 2: 4:00 - 5:30 PM, Ferdinand Hall

Title: Developing a toy grammar of MWE using XMG

Author: Simon Petitjean (Universität Düsseldorf, Germany)

Abstract: This tutorial is an introduction to eXtensible MetaGrammar (XMG), a grammar engineering tool generating linguistic resources (grammars or lexicons for example) from a compact, abstract description (the metagrammar). The high modularity of XMG allows a lot of flexibility on the type of the created resources and on the languages used to describe these resources. The concept of dimensions makes it possible to separate different levels of linguistic representation, and to use description languages adapted to the specific structures involved in them.  We will focus on two of these dimensions, dedicated to the description of  syntactic trees and typed feature structures, and show the steps of development of a toy metagrammar for multi-word expressions within them.

Those of you who want to use XMG during the tutorial are advised to pre-install on their computer, before the tutorial:
+ The XMG's latest stable release:
     - Instructions at
     2 alternatives are given:
     + either you install XMG (Linux only) or you download the ready-to-use VirtualBox disk image

Further reading: http://jlm.ipipan.waw.pl/index.php/JLM/article/view/96
XMG website: https://code.launchpad.net/xmg-ng