PARSEME general meeting Annotathon, 26 September 2016 Dubrovnik Minutes (by Carlos and Agata) 1. ST latest news - Veronika's presentation - Overall summary of shared task - Teams, guidelines, infrastructure... - dates of shared task too tight? - send announcement soon - website with all details? - exists 2. HTML guidelines - Silvio's and Carlos' presentation - What's new and how to edit examples - Several examples in one language? We'll add explanation about
tag - we must check: DE examples in EN part - numbering tests 8.1 -> 7.1 3. FLAT demon - Federico's and Marteen's presentation - how to create an account, log in, upload file - Difference between Comment and Description? - comments are more "free" - Different colors for multiple examples? - Query for low-confidence MWEs - not existing but would be interesting - Download the file - XML for each annotator - Upload many small files - use shared drive (see 5.) 4. Preparing the final corpus: Voula's and Agata's presentation - Choice of corpora, format for files, manage annotation - see the slides - Annotation management, spreadsheets, google drive 5. Remaining issues in the guidelines - Marie's and Iva's presentation - LVCs - the notion of "bleached" semantics for light verbs is unclear for verbs with very general meaning (e.g. give, have) - LVC issue: the verb is the syntactic head, but the noun is its semantic head, and it is the noun which selects the verb (or a list of verbs) rather than the opposite. This is contrary to the status of the verb in other VMWEs. - Iva: agrees on the structure but the footnote on LVCs shoudl be checked - Impact should be put on the fact that our view on LVCs is non-standard - Farsi linguistic are not very happy with the guidelines terminology. Should we create a langauge-specific category for Compound Verbs in Farsi? 6. System evaluation tools - Antoine's presentation - Evaluation of shared task systems - Ground truth : the annotation that that people agree on - Evaluation metrics - per-MW or per-token - see previous work (e.g. DIMSUM) - Tracks: open (with use of external resources, provided that they are described), closed (with no external resources), etc? * Symbolic systems would be in the open track (they use grammars). * The closed track encourages cross-language systems. But most systems would rely at least on POS tags - How to present the evaluation results: per language? per language group? global average? shoudl VMWE categories be distinguished (they are not the same for all lanaguges) 7. PLATINUM: v6-compatible The platinum corpora from phase 2 should be adjusted to v6 of the guidelines. Here is what we propose: - we keep the existing platinum versions intact, so as to keep trace of the common decision - we create a copy of the platinum version for each language, and we add it to the master spreadsheet in the "Phase 2" sheet as "Platinum (compatible to guidelines v6)" - we adjust the validity checks so as to * eliminate the IPrepV category * transform IPrepV to IPronV * eliminate the hesitation labels ==== TODO: Carlos and Silvio: - fix the issues in the HTML guidelines (point 2 above) + instruct the LLs about filling in language-specific examples in the guidelines Federico: + prepare the platinum spreadsheets compatible with guidelines v6 (HTML version) Federico & Berang: - define groups and workspace visibility in FLAT, according to the language teams and langauge groups LLs: - encourage all annotators to create FLAT accounts and test FLAT - upgrade the platinum standard to guidelines v6 - fill in the language-specific examples in the guidelines - prepare the final corpus - start the annotation * deadline: end December * objectives: 3,500 annotated VMWEs Agata, Veronika, Antoine, Carlos: - prepare the 1st call for ST participants - add the deadline for publishing the platinum standards Behrang et al: - Create Farsi category for Compound Verbs + guidelines? Guidelines authors: - stress the fact that our definitions of LVCs, IDs etc. are non-standard - check if the guidelines specify that the head verb belongs to the set of lexicalized components of a VMWE, and if this is not contradictory with the LVC description - check the footnote on LVCs Agata: + publish the Ananotathon minutes + finish the LL guide - gather the email addressed to be used with Google drive and give access + finish preparing the Google drive for the annotation files + finish preparing the file assignment spreadsheet - send feedback from the Annotathon * link to the Dubrovnik Annotathon page * remind the available documents * encourage filling in the examples in the guidelines * announce communicaton channel via Telegram * organise testing new FLAT functionalities * instruct the LL that the double annotation for IAA should be done first * remind the languages with very poor IAA in phase 2: French, English, Hungarian, Polish, Italian, Portuguese, Spanish Maarten (ideas of new functionalities for FLAT): - Displaying low-confidence annotations