Evaluating Discourse and Dialogue Coding Schemes

Authors:
Richard Craggs;Mary McGee Wood
Affiliations:
-;-
Venue:
Computational Linguistics
Year:
2005

Citing 3
Cited 9

Assessing agreement on classification tasks: the kappa statistic

Computational Linguistics
Automatic classification of dialog acts with semantic classification trees and polygrams

Connectionist, Statistical, and Symbolic Approaches to Learning for Natural Language Processing
The kappa statistic: a second look

Computational Linguistics

Multidimensional text analysis for eRulemaking

dg.o '06 Proceedings of the 2006 international conference on Digital government research
An evaluation scheme for hierarchical information browsing structures

CHI '08 Extended Abstracts on Human Factors in Computing Systems
Reliability measurement without limits

Computational Linguistics
Inter-coder agreement for computational linguistics

Computational Linguistics
Exploiting 'subjective' annotations

HumanJudge '08 Proceedings of the Workshop on Human Judgements in Computational Linguistics
Learning human multimodal dialogue strategies

Natural Language Engineering
What determines inter-coder agreement in manual annotations? a meta-analytic investigation

Computational Linguistics
Predicting dialogue acts from prosodic information

CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
Recognizing team context during simulated missions

Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work

Quantified Score

Hi-index	0.00

Visualization

Abstract

Agreement statistics play an important role in the evaluation of coding schemes for discourse and dialogue. Unfortunately there is a lack of understanding regarding appropriate agreement measures and how their results should be interpreted. In this article we describe the role of agreement measures and argue that only chance-corrected measures that assume a common distribution of labels for all coders are suitable for measuring agreement in reliability studies. We then provide recommendations for how reliability should be inferred from the results of agreement statistics.