From annotator agreement to noise models

Authors:
Beata Beigman Klebanov;Eyal Beigman
Affiliations:
-;-
Venue:
Computational Linguistics
Year:
2009

Citing 19
Cited 9

Assessing agreement on classification tasks: the kappa statistic

Computational Linguistics
Large Margin Classification Using the Perceptron Algorithm

Machine Learning - The Eleventh Annual Conference on computational Learning Theory
Automatic labeling of semantic roles

Computational Linguistics
The disambiguation of nominalizations

Computational Linguistics
Learning noisy perceptrons by a perceptron in polynomial time

FOCS '97 Proceedings of the 38th Annual Symposium on Foundations of Computer Science
A polynomial-time algorithm for learning noisy linear threshold functions

FOCS '96 Proceedings of the 37th Annual Symposium on Foundations of Computer Science
An empirically based system for processing definite descriptions

Computational Linguistics
A corpus-based investigation of definite description use

Computational Linguistics
The kappa statistic: a second look

Computational Linguistics
Metonymy resolution as a classification task

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
The Proposition Bank: An Annotated Corpus of Semantic Roles

Computational Linguistics
Automatic Discovery of Part-Whole Relations

Computational Linguistics
Characterizing and Predicting Corrections in Spoken Dialogue Systems

Computational Linguistics
Reliability measurement without limits

Computational Linguistics
Inter-coder agreement for computational linguistics

Computational Linguistics
Analyzing disagreements

HumanJudge '08 Proceedings of the Workshop on Human Judgements in Computational Linguistics
Exploiting 'subjective' annotations

HumanJudge '08 Proceedings of the Workshop on Human Judgements in Computational Linguistics
Learning with annotation noise

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
The PASCAL recognising textual entailment challenge

MLCW'05 Proceedings of the First international conference on Machine Learning Challenges: evaluating Predictive Uncertainty Visual Object Classification, and Recognizing Textual Entailment

Learning with annotation noise

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Some empirical evidence for annotation noise in a benchmarked dataset

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Reliability and type of consumer health documents on the world wide web: an annotation study

Louhi '10 Proceedings of the NAACL HLT 2010 Second Louhi Workshop on Text and Data Mining of Health Documents
To annotate more accurately or to annotate more

LAW IV '10 Proceedings of the Fourth Linguistic Annotation Workshop
A posteriori agreement as a quality measure for readability prediction systems

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
Evaluating the impact of coder errors on active learning

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
What determines inter-coder agreement in manual annotations? a meta-analytic investigation

Computational Linguistics
Short answer assessment: establishing links between research strands

Proceedings of the Seventh Workshop on Building Educational Applications Using NLP
Pair annotation: adaption of pair programming to corpus annotation

LAW VI '12 Proceedings of the Sixth Linguistic Annotation Workshop

Quantified Score

Hi-index	0.01

Visualization

Abstract

This article discusses the transition from annotated data to a gold standard, that is, a subset that is sufficiently noise-free with high confidence. Unless appropriately reinterpreted, agreement coefficients do not indicate the quality of the data set as a benchmarking resource: High overall agreement is neither sufficient nor necessary to distill some amount of highly reliable data from the annotated material. A mathematical framework is developed that allows estimation of the noise level of the agreed subset of annotated data, which helps promote cautious benchmarking.