Assessing agreement on classification tasks: the kappa statistic
Computational Linguistics
Message Understanding Conference-6: a brief history
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Extracting personal names from email: applying named entity recognition to informal text
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
The GENIA corpus: an annotated research abstract corpus in molecular biology domain
HLT '02 Proceedings of the second international conference on Human Language Technology Research
Nested Named Entity Recognition in Historical Archive Text
ICSC '07 Proceedings of the International Conference on Semantic Computing
A study of inter-annotator agreement for opinion retrieval
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
A memory-based learning approach to event extraction in biomedical texts
BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task
Automating knowledge capture in the aerospace domain
Proceedings of the fifth international conference on Knowledge capture
Towards a methodology for named entities annotation
ACL-IJCNLP '09 Proceedings of the Third Linguistic Annotation Workshop
Semi-supervised named entity recognition: learning to recognize 100 entity types with little supervision
A pilot investigation of information extraction in the semantic annotation of archaeological reports
International Journal of Metadata, Semantics and Ontologies
Hi-index | 0.00 |
Manual document annotation is an essential technique for knowledge acquisition and capture. Creating high-quality annotations is a difficult task due to inter-annotator discrepancy, the problem that annotators can never agree completely on what and exactly how to annotate. To address this, traditional document annotation involves multiple domain experts working on the same annotation task in an iterative and collaborative manner to identify and resolve discrepancies progressively. However, such a detailed process is often ineffective despite taking significant time and effort; unfortunately, discrepancies remain high in many cases. This paper proposes an alternative approach to document annotation. The approach tackles the problem by firstly studying annotators' suitability based on the types of information to be annotated; then identifying and isolating the most inconsistent annotators who tend to cause the majority of discrepancies in a task; finally distributing annotation workload among the most suitable annotators. Tested in a named entity annotation task in the domain of archaeology, we show that compared to the traditional approach to document annotation, it produces larger amounts of better quality annotations that result in higher machine learning accuracy while requires significantly less time and effort.