Addressing the resource bottleneck to create large-scale annotated texts

Authors:
Jon Chamberlain;Massimo Poesio;Udo Kruschwitz
Affiliations:
University of Essex, UK;University of Essex, UK & Universitàà di Trento, Italy;University of Essex, UK
Venue:
STEP '08 Proceedings of the 2008 Conference on Semantics in Text Processing
Year:
2008

Citing 7
Cited 4

Improving the design of intelligent acquisition interfaces for collecting world knowledge from web contributors

Proceedings of the 3rd international conference on Knowledge capture
The Proposition Bank: An Annotated Corpus of Semantic Roles

Computational Linguistics
Peekaboom: a game for locating objects in images

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Games with a Purpose

Computer
The reliability of anaphoric annotation, reconsidered: taking ambiguity into account

CorpusAnno '05 Proceedings of the Workshop on Frontiers in Corpus Annotations II: Pie in the Sky
OntoNotes: the 90% solution

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Web-based annotation of anaphoric relations and lexical chains

LAW '07 Proceedings of the Linguistic Annotation Workshop

The first challenge on generating instructions in virtual environments

Empirical methods in natural language generation
Towards generating text from discourse representation structures

ENLG '11 Proceedings of the 13th European Workshop on Natural Language Generation
A platform for collaborative semantic annotation

EACL '12 Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics
Phrase detectives: Utilizing collective intelligence for internet-scale language resource creation

ACM Transactions on Interactive Intelligent Systems (TiiS) - Special section on internet-scale human problem solving and regular papers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Large-scale linguistically annotated resources have become available in recent years. This is partly due to sophisticated automatic and semiautomatic approaches that work well on specific tasks such as part-of-speech tagging. For more complex linguistic phenomena like anaphora resolution there are no tools that result in high-quality annotations without massive user intervention. Annotated corpora of the size needed for modern computational linguistics research cannot however be created by small groups of hand annotators. The ANAWIKI project strikes a balance between collecting high-quality annotations from experts and applying a game-like approach to collecting linguistic annotation from the general Web population. More generally, ANAWIKI is a project that explores to what extend expert annotations can be substituted by a critical mass of non-expert judgements.