Assessing agreement on classification tasks: the kappa statistic
Computational Linguistics
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Inter-coder agreement for computational linguistics
Computational Linguistics
Semi-supervised training for the averaged perceptron POS tagger
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Complex linguistic annotation --- no easy way out!: a case from Bangla and Hindi POS labeling tasks
ACL-IJCNLP '09 Proceedings of the Third Linguistic Annotation Workshop
Assessing the benefits of partial automatic pre-labeling for frame-semantic annotation
ACL-IJCNLP '09 Proceedings of the Third Linguistic Annotation Workshop
A prototype tool set to support machine-assisted annotation
BioNLP '12 Proceedings of the 2012 Workshop on Biomedical Natural Language Processing
Hi-index | 0.00 |
This article details a series of carefully designed experiments aiming at evaluating the influence of automatic pre-annotation on the manual part-of-speech annotation of a corpus, both from the quality and the time points of view, with a specific attention drawn to biases. For this purpose, we manually annotated parts of the Penn Treebank corpus (Marcus et al., 1993) under various experimental setups, either from scratch or using various pre-annotations. These experiments confirm and detail the gain in quality observed before (Marcus et al., 1993; Dandapat et al., 2009; Rehbein et al., 2009), while showing that biases do appear and should be taken into account. They finally demonstrate that even a not so accurate tagger can help improving annotation speed.