A methodology towards effective and efficient manual document annotation: addressing annotator discrepancy and annotation quality

Authors:
Ziqi Zhang;Sam Chapman;Fabio Ciravegna
Affiliations:
Department of Computer Science, University of Sheffield, UK;K-Now, UK;Department of Computer Science, University of Sheffield, UK and K-Now, UK
Venue:
EKAW'10 Proceedings of the 17th international conference on Knowledge engineering and management by the masses
Year:
2010

Citing 10
Cited 1

Assessing agreement on classification tasks: the kappa statistic

Computational Linguistics
Message Understanding Conference-6: a brief history

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Extracting personal names from email: applying named entity recognition to informal text

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
The GENIA corpus: an annotated research abstract corpus in molecular biology domain

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Nested Named Entity Recognition in Historical Archive Text

ICSC '07 Proceedings of the International Conference on Semantic Computing
A study of inter-annotator agreement for opinion retrieval

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
A memory-based learning approach to event extraction in biomedical texts

BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task
Automating knowledge capture in the aerospace domain

Proceedings of the fifth international conference on Knowledge capture
Towards a methodology for named entities annotation

ACL-IJCNLP '09 Proceedings of the Third Linguistic Annotation Workshop
Semi-supervised named entity recognition: learning to recognize 100 entity types with little supervision

Semi-supervised named entity recognition: learning to recognize 100 entity types with little supervision

A pilot investigation of information extraction in the semantic annotation of archaeological reports

International Journal of Metadata, Semantics and Ontologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

Manual document annotation is an essential technique for knowledge acquisition and capture. Creating high-quality annotations is a difficult task due to inter-annotator discrepancy, the problem that annotators can never agree completely on what and exactly how to annotate. To address this, traditional document annotation involves multiple domain experts working on the same annotation task in an iterative and collaborative manner to identify and resolve discrepancies progressively. However, such a detailed process is often ineffective despite taking significant time and effort; unfortunately, discrepancies remain high in many cases. This paper proposes an alternative approach to document annotation. The approach tackles the problem by firstly studying annotators' suitability based on the types of information to be annotated; then identifying and isolating the most inconsistent annotators who tend to cause the majority of discrepancies in a task; finally distributing annotation workload among the most suitable annotators. Tested in a named entity annotation task in the domain of archaeology, we show that compared to the traditional approach to document annotation, it produces larger amounts of better quality annotations that result in higher machine learning accuracy while requires significantly less time and effort.