PARSEME shared task Skype meeting 29 March 2016, 9:00 CET Marie (mariecandito), Voula (vgiouli1), Veronika (akinorev1981), Agata (agata.savary), Fabienne (fritzife03), Ivelina Stoyanova (iv100yanova) ============ Minutes: 1. Advances in phase 2: - Voula: * Greek, Maltese and Turkish completed, 2 annotators per language * Hungarian - to be done in a couple of days - Fabienne: German (2 complete + others still to come); Swedish (2-3 complete); English (1 complete); Yiddish (no news so far) - Marie: Portuguese (4 completed + detailed feedback); French (2,5 completed + some feedback); Spanish (will be late); Romanian (ongoing), Italian (no feedback, late start) - Agata: Polish (2 complete), Ivelina (2 complete, detailed feedback); the guidelines seem better than in pĥase 1) 2. Testers for the two annotation tool candidates - Veronika, Ismail - for the English annotations 3. Feedback on IPronVs, IPrepVs - Marie: - IPrepV: the criterion of the "substantial change in meaning" is a bit vague, maybe more detailed tests are needed? - IPrepV: what is the (compulsory) preposition alternates with another preposition? - cases which seem to fit to several categories (IPrepV and IPronV within ID; should we promote IDs in this case? or should we say that an IPrepV/IPronV is embedded in a ID?) - proposal: reproduce the DIMSUM test for IPrepVs to our guidelines - Voula: * same issue of IPronVs embedded within an ID * the IPronV category was understood more broadly than for Romance and Slavic lanagaages (with any clitic, not only a reflexive one) * the guidelines should probably explicitly state that this category does not apply to Greek - Fabienne: no issues for these categories - Ivelina: IPronVs and IPrepVs apply to Bulgarian, no issues so far 4. Feedback on LVCs: - Fabienne - no issues so far - Ivelina: * the hardest category to annotate * is an LVC a VMWE at all? - Voula: * less discrepancies than in phase 1 (better agreement) * test 17 does not always apply (Agata: this test is not compulsory) - Marie: * issues with tests 11 for verbs with a general meaning (Veronika: probably solved by tests 18-22) 5. Feedback on IDs - Marie: * seen as a "default" category * it would be useful to decide which categories are systematically chosen in case of ambiguity - Voula: * ambiguities between LVC and IDs are relatively frequent - Ivelina: * embedded MWEs are frequent in LVCs, the question often arises if the whole is a VMWE or just the nominal component (Agata: see the remarks on the copula head verbs p. 15) - Fabienne: no major issues - Agata: * if the NP complement has a literal meaning the status of the whole as a VMWE is less evident than for totally figurative cases 6. Feedback on VPCs - Fabienne: * hesitation as to which kinds of constructions should be annotated - all VPCs? only the non-compositional ones? how to distinguish them? * mostly resolved due to the input from Sabine Shulte im Walde (German expert on LVCs): additional tests defined - Agata: * VPCs written in one token should be annotated; these are typical MWT cases - 2 reasons: (i) they should have the same annotation status whether the occur with a separated particle or not since they are all morpho-syntactic variants of the same constructions; (ii) separable VPCs should be distinguished from non-separable ones, in some cases (e.g. um|fahren) this is ambiguous out of context (Fabienne & Veronika agree to annotate them) 7. Feedback on OTH - Ivelina: * in Bulgarian this category includes cases of MWEs which are derived from verbs but lost the verbal meaning (Agata: such cases were excluded from the universal guidelines; they can freely be annotated as a language-specific category but a different label should be used) - Agata: * comparative VMWEs are interesting cases in OTH (semantically largely compositional but lexically fixed), they might be mentioned in the description of this category 8. Should hesitation labels (LVC/ID, LVC/_ etc.) be kept for the final annotation? - Agata - it would be easier to abandon them; several reasons for that * they have an ambiguous meaning: (i) non-confidence of the annotator, (ii) true ambiguity meant by the text author, (iii) not enough context available to disambiguate * they are hard to account for in the IAA calculus, and in the final shared task evaluation * the status of a hesitation label including a '_' is different from the others (hesitation between MWE/non-MWE status vs. hesitation between two categories) - Ivelina - no opinion - Marie - maybe the guidelines should explicitly mention which category is preferred in case of hesitation - Voula - hesitation labels were useful for discussions and resolving divergences (Agata: they might be replaced by an additional column with a confidence score) - Fabienne - OK to abandon the hesitation labels 9. Final corpus choice: - Voula: experiments were made in Greek in phase 2 with user-generated contents; many difficult issues encountered (typos, deviations, customized expressions), this genre should not be included in the shared task - Veronika: Wikipedia articles annotated in Hungarian, this genre is close to newspaper texts - Marie: Wikipedia articles considered for French - open licenses strongly promoted over compatibility with the existing corpora 10. Organizing the Annotathon in Struga: - session 1 (Thursday): * progress so far * summary of the guidelines v5 * demo of the annotation merging script * most challenging issues in phases 1 and 2; glossed examples in several languages * golden rules to discuss # choice of the corpus # independent double annotation of a part of each corpus (for the IAA calculus) * brainstorming on the decisions to make - session 2 (Friday) * demo of the 2 annotation tools (FLAT vs. PARSEMEbot) * testing the tools by participants * choice of the tool - list of decisions to make * choice of the tool, * choice of the corpus genre, * availability of the corpus (high priority to the open availability compatibility with the existing corpora), * substantial changes to the guidelines if needed * language-specific examples not compulsory for the whole guidelines ===== TODO: Federico: - check if a column for a confidence score (between 0 and 1) may be added to the annotation tools (value 1 per default, but editable if needed) - prepare a demo of the 2 annotation tools for the Friday session Ivelyna: - send glossed challenging examples in Bulgarian * ambiguity between a VMWE and an embedded MWE * give a different (than OTH) label in Bulgarian to MWEs which stem from verbs but lost their verbal meaning Voula: - send glossed challenging examples in Greek * elliptical LVC coordinated with ID * ambiguity between LVC and ID - make the Greek language-specific guidelines available (if translated to EN) - send a summary of the "other" group feedback onces all annotations are complete Fabienne: - send the tests for German VPC (from Sabine) - send a summary of the Germanic group feedback onces all annotations are complete Marie+Carlos: - send glossed challenging examples of ambiguities between various categories - send a summary of the Romance feedback onces all annotations are complete Agata+Veronika: - promote a confidence value in the annotation format to signal hesitation - guidelines enhancements: * add page numbers * for each quasi-universal category: the list of languages to which this categories does/does not apply * require the annotation of one-token idiomatic VPCs in German etc. (e.g. vor|bereiten, auf|ge|macht) * mention the comparative VMWEs (e.g. "to sleep like a log") in OTH * add DIMSUM-like tests for IPrepVs, try to formalize the "substantial change in meaning" * which category to systematically prefer in case of hesitation? - (Agata) inform Federico about the English testers for the 2 tools - specify the final corpus genre; newspaper texts + Wikipedia articles - stress the strong preference for open source licenses (for the corpora) over compatibility with the existing corpora Veronika: - slides for Struga on the general progress report - slides for Struga summarizing guidelines v5 - slides with challenging examples sent by the LGLs All LGLs: - 2-3 slides on the progress in their groups (even if you don't come to Struga)