Analyzing disagreements

Authors:
Beata Beigman Klebanov;Eyal Beigman;Daniel Diermeier
Affiliations:
Northwestern University;Northwestern University;Northwestern University
Venue:
HumanJudge '08 Proceedings of the Workshop on Human Judgements in Computational Linguistics
Year:
2008

Citing 1
Cited 8

Assessing agreement on classification tasks: the kappa statistic

Computational Linguistics

Discourse topics and metaphors

CALC '09 Proceedings of the Workshop on Computational Approaches to Linguistic Creativity
From annotator agreement to noise models

Computational Linguistics
Learning with annotation noise

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
On the contextual analysis of agreement scores

Multimodal corpora
A game-theoretic model of metaphorical bargaining

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Evaluating the impact of coder errors on active learning

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
What determines inter-coder agreement in manual annotations? a meta-analytic investigation

Computational Linguistics
Aggregation of multiple judgments for evaluating ordered lists

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

We address the problem of distinguishing between two sources of disagreement in annotations: genuine subjectivity and slip of attention. The latter is especially likely when the classification task has a default class, as in tasks where annotators need to find instances of the phenomenon of interest, such as in a metaphor detection task discussed here. We apply and extend a data analysis technique proposed by Beigman Klebanov and Shamir (2006) to first distill reliably deliberate (non-chance) annotations and then to estimate the amount of attention slips vs genuine disagreement in the reliably deliberate annotations.