Assessing agreement on classification tasks: the kappa statistic
Computational Linguistics
Automatic classification of dialog acts with semantic classification trees and polygrams
Connectionist, Statistical, and Symbolic Approaches to Learning for Natural Language Processing
The kappa statistic: a second look
Computational Linguistics
Multidimensional text analysis for eRulemaking
dg.o '06 Proceedings of the 2006 international conference on Digital government research
An evaluation scheme for hierarchical information browsing structures
CHI '08 Extended Abstracts on Human Factors in Computing Systems
Reliability measurement without limits
Computational Linguistics
Inter-coder agreement for computational linguistics
Computational Linguistics
Exploiting 'subjective' annotations
HumanJudge '08 Proceedings of the Workshop on Human Judgements in Computational Linguistics
Learning human multimodal dialogue strategies
Natural Language Engineering
What determines inter-coder agreement in manual annotations? a meta-analytic investigation
Computational Linguistics
Predicting dialogue acts from prosodic information
CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
Recognizing team context during simulated missions
Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work
Hi-index | 0.00 |
Agreement statistics play an important role in the evaluation of coding schemes for discourse and dialogue. Unfortunately there is a lack of understanding regarding appropriate agreement measures and how their results should be interpreted. In this article we describe the role of agreement measures and argue that only chance-corrected measures that assume a common distribution of labels for all coders are suitable for measuring agreement in reliability studies. We then provide recommendations for how reliability should be inferred from the results of agreement statistics.