PARSEME 2nd Training School La Rochelle, 27 June - 1 July 2016 ============================================ I. PROJECT GROUPS AND CHALLENGING ISSUES (long list, see the webpage for the short list: http://typo.uni-konstanz.de/parseme/index.php/2-general/180-la-rochelle-training-school-project-groups#groups) [DISC, Fabienne] discovery issues: * how to distinguish literal from compositional readings? how to detect MWEs in context? (Aedmaa, Blagus, Majchrakova, Mititelu, Sanchez, Taslimipoor) * Use of statistical approaches in the discovery and description of MWEs – cases of success, cases of failure, role of possible additional lexical information and/or resources. (Mandravickaite) * differentiating multiword terms from collocations (Léon Araúz) * Algorithms for extracting MWEs from parallel corpora (Petrovski) * Conflating variants of the same MWE. How to automatically identify inflected forms of MWEs in a corpus of an agglutinative language, while only their base forms are present in the lexicon, i.e. how to match a lexicon against a corpus? (Artaud, Adali) * diachronic studies for identifying fully from partially grammaticalized items (Ganfi) * using Multi-Word Expressions for post-OCR correction (Chiron) [LEX, Adam, Agnieszka] lexicon (encoding) issues: * How to describe such MWEs that permit some variation to a certain degree in a lexicon? (Bejcek) * The formalized representation of verb idioms’ syntagmatic peculiarities and all quantitative and positional changes of the idiom components. (Todorova) * How to automatically obtain an NLP-oriented MWE lexicon from legacy dictionaries (with human-oriented descriptions)? MWEs must be analyzed to understand which words will be changing in various contexts and which are fixed (legacy dictionaries include no such information). (Pretkalnina) * Productivity of different elements in a MWE. (Sarg) * lexicon model for MWEs to support transcription of ancient manuscripts (Kesiman) [PARS, Matthieu] parsing issues: * impact of transformations of MWE representations (collapsing into words-with-spaces?) in treebanks on parsing accuracy (de Lhoneux) * estimators of the probabilities of the grammar entries corresponding to MWEs (Waszczuk) * which types of MWEs should be handled at the stage of the syntactic parsing? (Waszczuk) * how can a parser deal with re-duplication MWEs and verbs with prefixes in Indonesian languages (Grangé) * distinguishing between an idiomatic and a compositional meaning of the same string of words in MT and parsing (Mititelu) [ANN, Victoria] annotation issues: * Annotation of MWEs in the Universal Dependencies (Klyueva) * Can syntagmatic expression examined so far, be applied in corpus marking? (Matas) * Is it possible to represent semantically idiosyncratic MWEs in (syntactic) treebanks? How? (Ramisch) * status of metaphors wrt. collocations and MWEs (Vincze) * How to automatically identify inflected forms of MWEs in a corpus of an agglutinative language, while only their base forms are present in the lexicon, i.e. how to match a lexicon against a corpus? (Adali) [SEM, Koenraad] semantic issues: * How to correlate free arguments of a MWE with the free arguments of its meaning/paraphrase? (Margariti) * Emotional charge of a particular MWE (Pirego) * Inferring semantic relations codified within multiword terms (Léon Araúz) * The role of negation in MWEs. Semantic polarity of syntactically negated MWEs and its links with lexicalization degree. Periphrastic MWEs and their negation (Piunno, Herrero) [TRANS, Jörg, Agata] translation issues: * How to identify MWEs equivalents in comparable corpora? (Sanchez) * Pairs of idioms with the same compositional meaning but different idiomatic meanings (Simko) * The decoding issue of the free arguments of MWEs (Zakis) * Applying resources of rhetorical figures in machine translation (Mitrovic) * MWEs and Machine Translation (Petrovski) * Is it possible to integrate MWE resources (mainly lexicons) into SMT systems in a sound way, without using workarounds and patches? How? (Ramisch) * how NLP methods can help language learners in memorizing and understanding MWEs (Knapp) * distinguishing between an idiomatic and a compositional meaning of the same string of words in MT and parsing (Mititelu) ============================================ II. CHALLENGING ISSUES SUBMITTED BY THE TRAINEES ============================= Adali (see also a .txt file): - How to automatically identify inflected forms of MWEs in a corpus of an agglutinative language, while only their base forms are present in the lexicon, i.e. how to match a lexicon against a corpus? ============================= Artaud: - delimitation and interpretation of abbreviations and acronyms ============================= Eleri Aedmaa (see also a pdf): - How to detect the compositionality of the MWEs if its components are far from each other in the text? - How it is possible to “say” to computer that components constitute a meaningful unit without writing them to lexicon? ============================= Hiwa Asadpour (see also a pdf): - Morphosyntactic annotation of Urmia language corpora (tagset design, database design, different transcription systems, ...). ============================= Eduard Bejcek: - How to describe such MWEs that permit some variation to a certain degree in a lexicon? For example - A MWE with some secondary variant, like - rodinný dům/domek (family house/small house) -- deminutive - mistr/mistryně světa (world champion.masc/world champion.fem) -- feminization - United States of America / USA -- abbreviation - A MWE with a (hopefully) close set of variants and similar but not identical meaning, like - stát v popředí / na výsluní (lit.: to-stand in front / in the-sun; to be important) - stát v pozadí / ve stínu (lit.: to-stand in back / in the-shadow; to be invisible, to be less important) - A MWE with unlimited number of variants, like - dostat 5 měsíců natvrdo / dostat 3 roky podmíněně (to be sentenced to 5 months, unsuspended / to be sentenced to 3 years, suspended) - drží nejvyšší/první/druhou/třetí/předposlední/... příčku (lit.: he-holds top/first/second/third/last but one/... step; he is first/first/second/third/last but one/... in the charts) ============================= Goranka Blagus Bartolec (see also a pdf): - How to detect ambiguous meanings of MWEs (literal vs. idiomatic meanings, meanings depending on the use of singular or plural) using automatic extraction? ============================= Chiron: - using Multi-Word Expressions (and even Multi-Symbol Expressions) for post-OCR correction ============================= Vittorio Ganfi (see also a pdf): - Multiword prepositions - corpus studies supporting the following issues: * diachronic studies for identifying fully from partially grammaticalized items * relation between the schematicity and the fixedness of multiword prepositions and their frequency ============================= Grangé: - How can a parser deal with re-duplication MWEs in Indonesian? - How to parse verbs with prefixes? ============================= Carlos Herrero (see also a pdf): - Challenges when processing Spanish modal periphrastic constructions (tener + que + V, ir + a + V, etc.). ============================= Made Windu Kesiman: - lexicon model for MWEs to support transcription of ancient manuscripts ============================= Natalia Klyueva: - Annotation of MWEs in the Universal Dependencies (UD). Are the UD so universal with the respect to MWEs (for some languages in UD much more MWEs were annotated than for the others)? - How to represent MWEs in the vertical files (e.g. as structure attributes in form of XML tags in combination with the dependency relation attributes)? ============================= Alfred Knapp: - how NLP methods can help langauge learners in memorizing and understanding MWEs ============================= Pilar Léon Araúz: - differentiating multiword terms from collocations - infering semantic relations codified within multiword terms ============================= Miryam de Lhoneux: - How to evaluate the impact of transformations of MWE representations in treebanks on parsing accuracy. ============================= Daniela Majchrakova: - Where lies the borderline between MWE and non-significant word combinations? ============================= Justina Mandravickaite (see also a pdf): - Use of statistical approaches in the discovery and description of MWEs – cases of success, cases of failure, role of possible additional lexical information and/or resources. ============================= Elpiniki Margariti (see also a pdf): - How to correlate free arguments of a MWE with the free arguments of its meaning/paraphrase? ============================= Ivana Matas Ivanković: - How to treat pronominal syntagmatic expressions, are they MWE’s since they display syntactic, statistical and partial semantic idiomacity, e.g. nitko od kritičara ‘none of the critics’? - How to generate syntagmatic expressions and MWEs containing prepositions based on examples from corpus? - Can syntagmatic expression examined so far, be applied in corpus marking? ============================= Verginica Mititelu: - Automatic possibilities of distinguishing between an idiomatic and a compositional meaning of the same string of words. This concerns both MT and parsing. ============================= Jelena Mitrović: - applying resources of rhetorical figures in machine translation - applying resources of rhetorical figures in corpus annotation ============================= Alexandar Petrovski: - MWEs and Machine Translation - Algorithms for extracting MWEs from parallel corpora ============================= Valentina Piunno (see also a pdf): - The role of negation in MWEs. Semantic polarity of syntactically negated MWEs and its links with lexicalization degree. ============================= Lauma Pretkalnina: - How to automatically obtain an NLP-oriented MWE lexicon from legacy dictionaries (with human-oriented descriptions)? MWEs must be analyzed to understand which words will be changing in various contexts and which are fixed (legacy dictionaries include no such information). ============================= Belém Priego Sanchez: - Emotional charge of a particular MWE, notably its polarity (positive, negative or neutral). ============================= Carlos Ramisch: - Is it possible to represent semantically idiosyncratic MWEs in (syntactic) treebanks? How? - Is it possible to integrate MWE resources (mainly lexicons) into SMT systems in a sound way, without using workarounds and patches? How? ============================= Matiss Rikters: - Integrating processing of MWEs in statistical and hybrid MT ============================= Beatriz Sanchez: - How to distinguish free Word combination from meaningful MWEs in Terminology? - Do meaningful morphosyntactic patterns that can be used to identify MWE vary according to each specialized domain? - How to identify MWEs equivalents in comparable corpora? ============================= Dage Sarg (see also a pdf): - Productivity of different elements in a MWE. ============================= Katalin Simko (see also a pdf): - Pairs of languages may have certain idioms with the same compositional meaning (‘to jump out of one’s skin’) in both languages, but different idiomatic meanings (‘to be very scared’ in English and ‘to be very excited’ in Hungarian). How to cope with this issue in computational and theoretical linguistics? ============================= Shiva Taslimipoor (see also a pdf): - Identifying MWEs in context (token-wise rather than type-wise). ============================= Maria Todorova: - The formalized representation of verb idioms’ syntagmatic peculiarities and all quantitative and positional changes of the idiom components. - Approaches to automatic classification of verb idioms’ syntagmatic types. - Parsing of synthetic and analytical idiomatic verb forms combined with a complex and flexible word order and different structural peculiarities, such as mandatory components, discontinuous components, etc. - Approaches for grouping of verb idioms in structural types and in formal paradigmatic subtypes respectively in the morphologically rich languages. ============================= Veronika Vincze: - What is the status of metaphors? As they are not totally compositional (i.e. their meaning cannot be calculated from the original meaning of the words), should they be considered as multiword expressions, e.g. idioms? Or should they be treated differently from both compositional phrases and MWEs? Examples: My heart was broken. His temper was boiling. Waves of spam emails inundated his inbox. ============================= Jakub WASZCZUK: - How to learn reliable estimators of the probabilities of the grammar entries corresponding to MWEs, given their sparseness and poor representativeness in the existing treebanks? - At which stages of the parsing pipeline different types of MWEs should be handled. In paerticular, which types of MWEs should be handled at the stage of the syntactic parsing? ============================= George Zakis (see also a pdf): - Translational problems in MWEs: the decoding issue of the free arguments of MWEs. A brief examination of the genitive dative pronouns in MG MWEs.