Influence of pre-annotation on POS-tagged corpus development

  • Authors:
  • Karën Fort;Benoît Sagot

  • Affiliations:
  • Nancy / Paris, France;Paris, France

  • Venue:
  • LAW IV '10 Proceedings of the Fourth Linguistic Annotation Workshop
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

This article details a series of carefully designed experiments aiming at evaluating the influence of automatic pre-annotation on the manual part-of-speech annotation of a corpus, both from the quality and the time points of view, with a specific attention drawn to biases. For this purpose, we manually annotated parts of the Penn Treebank corpus (Marcus et al., 1993) under various experimental setups, either from scratch or using various pre-annotations. These experiments confirm and detail the gain in quality observed before (Marcus et al., 1993; Dandapat et al., 2009; Rehbein et al., 2009), while showing that biases do appear and should be taken into account. They finally demonstrate that even a not so accurate tagger can help improving annotation speed.