Assessing agreement on classification tasks: the kappa statistic
Computational Linguistics
International Journal of Human-Computer Studies - Special issue on collaboration, cooperation and conflict in dialogue systems
The reliability of a dialogue structure coding scheme
Computational Linguistics
Development and use of a gold-standard data set for subjectivity classifications
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Evaluating Discourse and Dialogue Coding Schemes
Computational Linguistics
Evaluating a computational model of social causality and responsibility
AAMAS '06 Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
The PARADISE Evaluation Framework: Issues and Findings
Computational Linguistics
User modeling and adaptation in health promotion dialogs with an animated character
Journal of Biomedical Informatics - Special issue: Dialog systems for health communications
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Expert Systems with Applications: An International Journal
Identifying Sources of Disagreement: Generalizability Theory in Manual Annotation Studies
Computational Linguistics
Task-based evaluation of text summarization using Relevance Prediction
Information Processing and Management: an International Journal
Classifying Non-Sentential Utterances in Dialogue: A Machine Learning Approach
Computational Linguistics
Relevance assessment: are judges exchangeable and does it matter
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Reliability measurement without limits
Computational Linguistics
Harshness in image classification accuracy assessment
International Journal of Remote Sensing
CSTST '08 Proceedings of the 5th international conference on Soft computing as transdisciplinary science and technology
Can all tags be used for search?
Proceedings of the 17th ACM conference on Information and knowledge management
Inter-coder agreement for computational linguistics
Computational Linguistics
Constructing corpora for the development and evaluation of paraphrase systems
Computational Linguistics
Expert vs. Non-expert Tutoring: Dialogue Moves, Interaction Patterns and Multi-utterance Turns
CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Kernel-based relation extraction from investigative data
Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data
An empirical approach to the interpretation of superlatives
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Active learning for the identification of nonliteral language
FigLanguages '07 Proceedings of the Workshop on Computational Approaches to Figurative Language
Identification of pleonastic it using the web
Journal of Artificial Intelligence Research
Using readers to identify lexical cohesive structures in texts
ACLstudent '05 Proceedings of the ACL Student Research Workshop
LAC '06 Proceedings of the Workshop on Frontiers in Linguistically Annotated Corpora 2006
Measuring annotator agreement in a complex hierarchical dialogue act annotation scheme
SigDIAL '06 Proceedings of the 7th SIGdial Workshop on Discourse and Dialogue
From annotator agreement to noise models
Computational Linguistics
Finding short definitions of terms on web pages
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
A proposal on evaluation measures for RTE
TextInfer '09 Proceedings of the 2009 Workshop on Applied Textual Inference
Genre-based paragraph classification for sentiment analysis
SIGDIAL '09 Proceedings of the SIGDIAL 2009 Conference: The 10th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Automated opinion detection: Implications of the level of agreement between human raters
Information Processing and Management: an International Journal
A wizard-of-Oz system evaluation study
TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
Interlingual annotation of parallel text corpora: A new framework for annotation and evaluation
Natural Language Engineering
Opinion mining of Spanish customer comments with non-expert annotations on Mechanical Turk
CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Annotating underquantification
LAW IV '10 Proceedings of the Fourth Linguistic Annotation Workshop
The annotation scheme of the Turkish Discourse Bank and an evaluation of inconsistent annotations
LAW IV '10 Proceedings of the Fourth Linguistic Annotation Workshop
Automatic indexing of speech segments with spontaneity levels on large audio database
Proceedings of the 2010 international workshop on Searching spontaneous conversational speech
Inequalities between multi-rater kappas
Advances in Data Analysis and Classification
Consumer trust in e-commerce web sites: A meta-study
ACM Computing Surveys (CSUR)
Enhancing opinion extraction by automatically annotated lexical resources
LTC'09 Proceedings of the 4th conference on Human language technology: challenges for computer science and linguistics
Exploring effective dialogue act sequences in one-on-one computer science tutoring dialogues
IUNLPBEA '11 Proceedings of the 6th Workshop on Innovative Use of NLP for Building Educational Applications
What determines inter-coder agreement in manual annotations? a meta-analytic investigation
Computational Linguistics
Annotating and learning event durations in text
Computational Linguistics
Modelling the orthographic neighbourhood for japanese kanji
ICCPOL'06 Proceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges ahead
Dynamic user modeling in health promotion dialogs
ACII'05 Proceedings of the First international conference on Affective Computing and Intelligent Interaction
A grain of salt for the WMT manual evaluation
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Building an automated SOAP classifier for emergency department reports
Journal of Biomedical Informatics
Random indexing for finding similar nodes within large RDF graphs
ESWC'11 Proceedings of the 8th international conference on The Semantic Web
A review of recent advances in learner and skill modeling in intelligent learning environments
User Modeling and User-Adapted Interaction
A dataset for the evaluation of lexical simplification
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
Automatic detection of rumor on Sina Weibo
Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Modeling social causality and responsibility judgment in multi-agent interactions
Journal of Artificial Intelligence Research
Natural language descriptions of visual scenes: corpus generation and analysis
EACL 2012 Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)
Semantic role labeling of implicit arguments for nominal predicates
Computational Linguistics
A novel semantic information retrieval system based on a three-level domain model
Journal of Systems and Software
Multiobjectivization for classifier parameter tuning
Proceedings of the 15th annual conference companion on Genetic and evolutionary computation
A granular neural network: Performance analysis and application to re-granulation
International Journal of Approximate Reasoning
Hi-index | 0.00 |
In recent years, the kappa coefficient of agreement has become the de facto standard for evaluating intercoder agreement for tagging tasks. In this squib, we highlight issues that affect κ and that the community has largely neglected. First, we discuss the assumptions underlying different computations of the expected agreement component of κ. Second, we discuss how prevalence and bias affect the κ measure.