Is it worth the effort? Assessing the benefits of partial automatic pre-labeling for frame-semantic annotation

Authors:
Ines Rehbein;Josef Ruppenhofer;Caroline Sporleder
Affiliations:
Saarland University, Saarbrücken, Germany 66041;Saarland University, Saarbrücken, Germany 66041;Saarland University, Saarbrücken, Germany 66041
Venue:
Language Resources and Evaluation
Year:
2012

Citing 10
Cited 0

Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
The Berkeley FrameNet Project

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Facilitating treebank annotation using a statistical parser

HLT '01 Proceedings of the first international conference on Human language technology research
Building a large-scale annotated Chinese corpus

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Applying Co-Training to reference resolution

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Bootstrapping coreference classifiers with multiple machine learning algorithms

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
A semi-automatic method for annotating a biomedical proposition bank

LAC '06 Proceedings of the Workshop on Frontiers in Linguistically Annotated Corpora 2006
Semi-automated named entity annotation

LAW '07 Proceedings of the Linguistic Annotation Workshop
Complex linguistic annotation --- no easy way out!: a case from Bangla and Hindi POS labeling tasks

ACL-IJCNLP '09 Proceedings of the Third Linguistic Annotation Workshop
Bringing active learning to life

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Corpora with high-quality linguistic annotations are an essential component in many NLP applications and a valuable resource for linguistic research. For obtaining these annotations, a large amount of manual effort is needed, making the creation of these resources time-consuming and costly. One attempt to speed up the annotation process is to use supervised machine-learning systems to automatically assign (possibly erroneous) labels to the data and ask human annotators to correct them where necessary. However, it is not clear to what extent these automatic pre-annotations are successful in reducing human annotation effort, and what impact they have on the quality of the resulting resource. In this article, we present the results of an experiment in which we assess the usefulness of partial semi-automatic annotation for frame labeling. We investigate the impact of automatic pre-annotation of differing quality on annotation time, consistency and accuracy. While we found no conclusive evidence that it can speed up human annotation, we found that automatic pre-annotation does increase its overall quality.