Assessing agreement on classification tasks: the kappa statistic
Computational Linguistics
The Trains 91 Dialogues
The TRAINS 93 Dialogues
The Monroe Corpus
An empirical investigation of proposals in collaborative dialogues
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
The kappa statistic: a second look
Computational Linguistics
Inter-coder agreement for computational linguistics
Computational Linguistics
Measuring coherence between electronic and manual annotations in biological databases
Proceedings of the 2009 ACM symposium on Applied Computing
Multidimensional dialogue management
SigDIAL '06 Proceedings of the 7th SIGdial Workshop on Discourse and Dialogue
Complex linguistic annotation --- no easy way out!: a case from Bangla and Hindi POS labeling tasks
ACL-IJCNLP '09 Proceedings of the Third Linguistic Annotation Workshop
Multifunctionality in dialogue
Computer Speech and Language
Ranked multidimensional dialogue act annotation
ESSLLI'10 Proceedings of the 2010 international conference on New Directions in Logic, Language and Computation
Hi-index | 0.00 |
We present a first analysis of inter-annotator agreement for the DIT++ tagset of dialogue acts, a comprehensive, layered, multidimensional set of 86 tags. Within a dimension or a layer, subsets of tags are often hierarchically organised. We argue that especially for such highly structured annotation schemes the well-known kappa statistic is not an adequate measure of inter-annotator agreement. Instead, we propose a statistic that takes the structural properties of the tagset into account, and we discuss the application of this statistic in an annotation experiment. The experiment shows promising agreement scores for most dimensions in the tagset and provides useful insights into the usability of the annotation scheme, but also indicates that several additional factors influence annotator agreement. We finally suggest that the proposed approach for measuring agreement per dimension can be a good basis for measuring annotator agreement over the dimensions of a multidimensional annotation scheme.