Coreference annotation schema for an inflectional language

  • Authors:
  • Maciej Ogrodniczuk;Magdalena Zawisławska;Katarzyna Głowińska;Agata Savary

  • Affiliations:
  • Institute of Computer Science, Polish Academy of Sciences, Poland;Institute of Polish Language, Warsaw University, Poland;Lingventa, Poland;Laboratoire d'informatique, François Rabelais University Tours, France

  • Venue:
  • CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Creating a coreference corpus for an inflectional and free-word-order language is a challenging task due to specific syntactic features largely ignored by existing annotation guidelines, such as the absence of definite/indefinite articles (making quasi-anaphoricity very common), frequent use of zero subjects or discrepancies between syntactic and semantic heads. This paper comments on the experience gained in preparation of such a resource for an ongoing project (CORE), aiming at creating tools for coreference resolution. Starting with a clarification of the relation between noun groups and mentions, through definition of the annotation scope and strategies, up to actual decisions for borderline cases, we present the process of building the first, to our best knowledge, corpus of general coreference of Polish.