The kappa statistic: a second look

Authors:
Barbara Di Eugenio;Michael Glass
Affiliations:
University of Illinois at Chicago, Computer Science, 1120 SEO (M/C 152), 851 South Morgan Street, Chicago, IL;Valparaiso University, Mathematics and Computer Science, 116 Gellerson Hall, Valparaiso, IN
Venue:
Computational Linguistics
Year:
2004

Citing 4
Cited 57

Assessing agreement on classification tasks: the kappa statistic

Computational Linguistics
The agreement process: an empirical investigation of human—human computer-mediated collaborative dialogs

International Journal of Human-Computer Studies - Special issue on collaboration, cooperation and conflict in dialogue systems
The reliability of a dialogue structure coding scheme

Computational Linguistics
Development and use of a gold-standard data set for subjectivity classifications

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics

Evaluating Discourse and Dialogue Coding Schemes

Computational Linguistics
Evaluating a computational model of social causality and responsibility

AAMAS '06 Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
The PARADISE Evaluation Framework: Issues and Findings

Computational Linguistics
User modeling and adaptation in health promotion dialogs with an animated character

Journal of Biomedical Informatics - Special issue: Dialog systems for health communications
A practically unsupervised learning method to identify single-snippet answers to definition questions on the web

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
The feasibility of constructing a Predictive Outcome Model for breast cancer using the tools of data mining

Expert Systems with Applications: An International Journal
Identifying Sources of Disagreement: Generalizability Theory in Manual Annotation Studies

Computational Linguistics
Task-based evaluation of text summarization using Relevance Prediction

Information Processing and Management: an International Journal
Classifying Non-Sentential Utterances in Dialogue: A Machine Learning Approach

Computational Linguistics
Relevance assessment: are judges exchangeable and does it matter

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Reliability measurement without limits

Computational Linguistics
Harshness in image classification accuracy assessment

International Journal of Remote Sensing
From prepared speech to spontaneous speech recognition system: a comparative study applied to French language

CSTST '08 Proceedings of the 5th international conference on Soft computing as transdisciplinary science and technology
Can all tags be used for search?

Proceedings of the 17th ACM conference on Information and knowledge management
Inter-coder agreement for computational linguistics

Computational Linguistics
Constructing corpora for the development and evaluation of paraphrase systems

Computational Linguistics
Expert vs. Non-expert Tutoring: Dialogue Moves, Interaction Patterns and Multi-utterance Turns

CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Kernel-based relation extraction from investigative data

Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data
An empirical approach to the interpretation of superlatives

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Active learning for the identification of nonliteral language

FigLanguages '07 Proceedings of the Workshop on Computational Approaches to Figurative Language
Identification of pleonastic it using the web

Journal of Artificial Intelligence Research
Using readers to identify lexical cohesive structures in texts

ACLstudent '05 Proceedings of the ACL Student Research Workshop
How and where do people fail with time: temporal reference mapping annotation by Chinese and English bilinguals

LAC '06 Proceedings of the Workshop on Frontiers in Linguistically Annotated Corpora 2006
Measuring annotator agreement in a complex hierarchical dialogue act annotation scheme

SigDIAL '06 Proceedings of the 7th SIGdial Workshop on Discourse and Dialogue
From annotator agreement to noise models

Computational Linguistics
Finding short definitions of terms on web pages

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
A proposal on evaluation measures for RTE

TextInfer '09 Proceedings of the 2009 Workshop on Applied Textual Inference
Genre-based paragraph classification for sentiment analysis

SIGDIAL '09 Proceedings of the SIGDIAL 2009 Conference: The 10th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Automated opinion detection: Implications of the level of agreement between human raters

Information Processing and Management: an International Journal
A wizard-of-Oz system evaluation study

TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
Interlingual annotation of parallel text corpora: A new framework for annotation and evaluation

Natural Language Engineering
Opinion mining of Spanish customer comments with non-expert annotations on Mechanical Turk

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Annotating underquantification

LAW IV '10 Proceedings of the Fourth Linguistic Annotation Workshop
The annotation scheme of the Turkish Discourse Bank and an evaluation of inconsistent annotations

LAW IV '10 Proceedings of the Fourth Linguistic Annotation Workshop
Automatic indexing of speech segments with spontaneity levels on large audio database

Proceedings of the 2010 international workshop on Searching spontaneous conversational speech
Inequalities between multi-rater kappas

Advances in Data Analysis and Classification
Consumer trust in e-commerce web sites: A meta-study

ACM Computing Surveys (CSUR)
Enhancing opinion extraction by automatically annotated lexical resources

LTC'09 Proceedings of the 4th conference on Human language technology: challenges for computer science and linguistics
Exploring effective dialogue act sequences in one-on-one computer science tutoring dialogues

IUNLPBEA '11 Proceedings of the 6th Workshop on Innovative Use of NLP for Building Educational Applications
What determines inter-coder agreement in manual annotations? a meta-analytic investigation

Computational Linguistics
Annotating and learning event durations in text

Computational Linguistics
Modelling the orthographic neighbourhood for japanese kanji

ICCPOL'06 Proceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges ahead
Dynamic user modeling in health promotion dialogs

ACII'05 Proceedings of the First international conference on Affective Computing and Intelligent Interaction
A grain of salt for the WMT manual evaluation

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Building an automated SOAP classifier for emergency department reports

Journal of Biomedical Informatics
Random indexing for finding similar nodes within large RDF graphs

ESWC'11 Proceedings of the 8th international conference on The Semantic Web
A review of recent advances in learner and skill modeling in intelligent learning environments

User Modeling and User-Adapted Interaction
A dataset for the evaluation of lexical simplification

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
Automatic detection of rumor on Sina Weibo

Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics
The problem with kappa

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Modeling social causality and responsibility judgment in multi-agent interactions

Journal of Artificial Intelligence Research
Natural language descriptions of visual scenes: corpus generation and analysis

EACL 2012 Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)
Semantic role labeling of implicit arguments for nominal predicates

Computational Linguistics
A novel semantic information retrieval system based on a three-level domain model

Journal of Systems and Software
Multiobjectivization for classifier parameter tuning

Proceedings of the 15th annual conference companion on Genetic and evolutionary computation
A granular neural network: Performance analysis and application to re-granulation

International Journal of Approximate Reasoning
Characterizing and detecting spontaneous speech: Application to speaker role recognition

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

In recent years, the kappa coefficient of agreement has become the de facto standard for evaluating intercoder agreement for tagging tasks. In this squib, we highlight issues that affect κ and that the community has largely neglected. First, we discuss the assumptions underlying different computations of the expected agreement component of κ. Second, we discuss how prevalence and bias affect the κ measure.