Assessing agreement on classification tasks: the kappa statistic
Computational Linguistics
Discourse topics and metaphors
CALC '09 Proceedings of the Workshop on Computational Approaches to Linguistic Creativity
From annotator agreement to noise models
Computational Linguistics
Learning with annotation noise
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
On the contextual analysis of agreement scores
Multimodal corpora
A game-theoretic model of metaphorical bargaining
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Evaluating the impact of coder errors on active learning
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
What determines inter-coder agreement in manual annotations? a meta-analytic investigation
Computational Linguistics
Aggregation of multiple judgments for evaluating ordered lists
ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Hi-index | 0.00 |
We address the problem of distinguishing between two sources of disagreement in annotations: genuine subjectivity and slip of attention. The latter is especially likely when the classification task has a default class, as in tasks where annotators need to find instances of the phenomenon of interest, such as in a metaphor detection task discussed here. We apply and extend a data analysis technique proposed by Beigman Klebanov and Shamir (2006) to first distill reliably deliberate (non-chance) annotations and then to estimate the amount of attention slips vs genuine disagreement in the reliably deliberate annotations.