Assessing agreement on classification tasks: the kappa statistic
Computational Linguistics
Large Margin Classification Using the Perceptron Algorithm
Machine Learning - The Eleventh Annual Conference on computational Learning Theory
Automatic labeling of semantic roles
Computational Linguistics
The disambiguation of nominalizations
Computational Linguistics
Learning noisy perceptrons by a perceptron in polynomial time
FOCS '97 Proceedings of the 38th Annual Symposium on Foundations of Computer Science
A polynomial-time algorithm for learning noisy linear threshold functions
FOCS '96 Proceedings of the 37th Annual Symposium on Foundations of Computer Science
An empirically based system for processing definite descriptions
Computational Linguistics
A corpus-based investigation of definite description use
Computational Linguistics
The kappa statistic: a second look
Computational Linguistics
Metonymy resolution as a classification task
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
The Proposition Bank: An Annotated Corpus of Semantic Roles
Computational Linguistics
Automatic Discovery of Part-Whole Relations
Computational Linguistics
Characterizing and Predicting Corrections in Spoken Dialogue Systems
Computational Linguistics
Reliability measurement without limits
Computational Linguistics
Inter-coder agreement for computational linguistics
Computational Linguistics
HumanJudge '08 Proceedings of the Workshop on Human Judgements in Computational Linguistics
Exploiting 'subjective' annotations
HumanJudge '08 Proceedings of the Workshop on Human Judgements in Computational Linguistics
Learning with annotation noise
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
The PASCAL recognising textual entailment challenge
MLCW'05 Proceedings of the First international conference on Machine Learning Challenges: evaluating Predictive Uncertainty Visual Object Classification, and Recognizing Textual Entailment
Learning with annotation noise
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Some empirical evidence for annotation noise in a benchmarked dataset
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Reliability and type of consumer health documents on the world wide web: an annotation study
Louhi '10 Proceedings of the NAACL HLT 2010 Second Louhi Workshop on Text and Data Mining of Health Documents
To annotate more accurately or to annotate more
LAW IV '10 Proceedings of the Fourth Linguistic Annotation Workshop
A posteriori agreement as a quality measure for readability prediction systems
CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
Evaluating the impact of coder errors on active learning
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
What determines inter-coder agreement in manual annotations? a meta-analytic investigation
Computational Linguistics
Short answer assessment: establishing links between research strands
Proceedings of the Seventh Workshop on Building Educational Applications Using NLP
Pair annotation: adaption of pair programming to corpus annotation
LAW VI '12 Proceedings of the Sixth Linguistic Annotation Workshop
Hi-index | 0.01 |
This article discusses the transition from annotated data to a gold standard, that is, a subset that is sufficiently noise-free with high confidence. Unless appropriately reinterpreted, agreement coefficients do not indicate the quality of the data set as a benchmarking resource: High overall agreement is neither sufficient nor necessary to distill some amount of highly reliable data from the annotated material. A mathematical framework is developed that allows estimation of the noise level of the agreed subset of annotated data, which helps promote cautious benchmarking.