Addressing the resource bottleneck to create large-scale annotated texts

  • Authors:
  • Jon Chamberlain;Massimo Poesio;Udo Kruschwitz

  • Affiliations:
  • University of Essex, UK;University of Essex, UK & Universitàà di Trento, Italy;University of Essex, UK

  • Venue:
  • STEP '08 Proceedings of the 2008 Conference on Semantics in Text Processing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Large-scale linguistically annotated resources have become available in recent years. This is partly due to sophisticated automatic and semiautomatic approaches that work well on specific tasks such as part-of-speech tagging. For more complex linguistic phenomena like anaphora resolution there are no tools that result in high-quality annotations without massive user intervention. Annotated corpora of the size needed for modern computational linguistics research cannot however be created by small groups of hand annotators. The ANAWIKI project strikes a balance between collecting high-quality annotations from experts and applying a game-like approach to collecting linguistic annotation from the general Web population. More generally, ANAWIKI is a project that explores to what extend expert annotations can be substituted by a critical mass of non-expert judgements.