PARSEME shared task Annotathon Struga, 7-8 April 2016 ========== Decisions: * choice of the corpus - OK for newspapers + wikipedia in the original * independent double annotation of a part of each corpus (for the IAA calculus) - merging after discussion in a separate spreadsheet - this could lead to creating a platinium standard 200 sentence corpus from phase 2 (distinct from the original annotations) * producing language-specific guidelines (language-specific examples not compulsory for the universal categories) - undiscussed * canceling hesitation labels, use of confidence scores - agreed Difficult cases: - we don't annotate ellipsis of type "he made one the decision and she two" - coreference "the decision ... she made it" should not be annotated - formulate a recommendation for individual languages on splitting collated prepositions and clitics haber|se, de(l)la * split them if you can (e.g. Italian) * if you can't, we should decide something consistent (e.g. annotate such a "mixed" verb but indicate that there is imprecision by a language-specific Validation script and merging annotation - demo done FLAT: - major issue - slow with 200 sentences PARSEMEbot: - the annotations are listed but not highlighted in the sentence; they could be highlighted but only with one color for every MWE - need: downloading annotated file to browse it locally Pre-Vote: - 5 for FLAT - 6 for PARSEMEbot Decision not taken, LL will test the tools and vote in 2 weeks time. Requests for extra features: - "annotation memory" - highlighting occurrences of potential MWEs already previously annotated - browsing through all previous annotations, also in other documents (in order to be consistent) - possibility of adding comments to annotations Deadlines: 12 April - file uploaded to FLAT 13 April - LGLs send emails to LLs 19 April - deadline for tests 20 April - LLs send votes to Veronika and Federico ==== TODO: Agata: - notify Behrang about the encoding issues in the tokenizer for Greek (errors are only visible for the validation script, when it's already too late to correct them) + send theses notes Federico: + add "platinium" spreadsheet to each language - a copy of "merge" - make "merge" non editable + upload the English corpus from pilot 2 (collated) to FLAT Veronika: + send instructions to LGLs about testing + collect votes LGLs: + send instructions to the LLs about testing