Inter-coder agreement for computational linguistics

Authors:
Ron Artstein;Massimo Poesio
Affiliations:
-;-
Venue:
Computational Linguistics
Year:
2008

Citing 38
Cited 95

Attention, intentions, and the structure of discourse

Computational Linguistics
Assessing agreement on classification tasks: the kappa statistic

Computational Linguistics
Statistical Models for Text Segmentation

Machine Learning - Special issue on natural language learning
Foundations of statistical natural language processing

Foundations of statistical natural language processing
A critique and improvement of an evaluation metric for text segmentation

Computational Linguistics
Summarizing scientific articles: experiments with relevance and rhetorical status

Computational Linguistics - Summarization
Segmenting Conversations by Topic, Initiative, and Style

Information Retrieval Techniques for Speech Applications [this book is based on the workshop “Information Retrieval Techniques for Speech Applications”, held as part of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in New Orleans, USA, in September 2001].
The Trains 91 Dialogues

The Trains 91 Dialogues
Corelex: systematic polysemy and underspecification

Corelex: systematic polysemy and underspecification
Topic segmentation: algorithms and applications

Topic segmentation: algorithms and applications
Dialogue systems as conversational partners: applying conversation acts theory to natural language generation for task-oriented mixed-initiative spoken dialogue

Dialogue systems as conversational partners: applying conversation acts theory to natural language generation for task-oriented mixed-initiative spoken dialogue
Dialogue act modeling for automatic tagging and recognition of conversational speech

Computational Linguistics
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
The reliability of a dialogue structure coding scheme

Computational Linguistics
TextTiling: segmenting text into multi-paragraph subtopic passages

Computational Linguistics
Discourse segmentation by human and automated means

Computational Linguistics
A corpus-based investigation of definite description use

Computational Linguistics
Recognizing subjectivity: a case study in manual tagging

Natural Language Engineering
Experiments on sentence boundary detection

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
An annotation scheme for discourse-level argumentation in research articles

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
95% Replicability for manual word sense tagging

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
An empirical investigation of proposals in collaborative dialogues

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Intention-based segmentation: human reliability and correlation with linguistic cues

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
CLAWS4: the tagging of the British National Corpus

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
The kappa statistic: a second look

Computational Linguistics
Information extraction and evaluation

MUC5 '93 Proceedings of the 5th conference on Message understanding
A model-theoretic coreference scoring scheme

MUC6 '95 Proceedings of the 6th conference on Message understanding
Resolving pronominal reference to abstract entities

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Evaluating Discourse and Dialogue Coding Schemes

Computational Linguistics
Abstract anaphora resolution in Danish

SIGDIAL '00 Proceedings of the 1st SIGdial workshop on Discourse and dialogue - Volume 10
Comparing several aspects of human-computer and human-human dialogues

SIGDIAL '01 Proceedings of the Second SIGdial Workshop on Discourse and Dialogue - Volume 16
Definitional, personal, and mechanical constraints on part of speech annotation performance

Natural Language Engineering
Reliability measurement without limits

Computational Linguistics
The reliability of anaphoric annotation, reconsidered: taking ambiguity into account

CorpusAnno '05 Proceedings of the Workshop on Frontiers in Corpus Annotations II: Pie in the Sky
Discourse annotation and semantic annotation in the GNOME corpus

DiscAnnotation '04 Proceedings of the 2004 ACL Workshop on Discourse Annotation
Augmenting the kappa statistic to determine interannotator reliability for multiply labeled data points

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
OntoNotes: the 90% solution

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Measuring annotator agreement in a complex hierarchical dialogue act annotation scheme

SigDIAL '06 Proceedings of the 7th SIGdial Workshop on Discourse and Dialogue

Relevance assessment: are judges exchangeable and does it matter

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Reliability measurement without limits

Computational Linguistics
Constructing corpora for the development and evaluation of paraphrase systems

Computational Linguistics
Reducing Noise in Labels and Features for a Real World Dataset: Application of NLP Corpus Annotation Methods

CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
Semantic Clustering for a Functional Text Classification Task

CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
Semi-formal Evaluation of Conversational Characters

Languages: From Formal to Natural
Disambiguation of biomedical abbreviations

BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
Exploring two biomedical text genres for disease recognition

BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
A method for stopping active learning based on stabilizing predictions and the need for user-adjustable stopping

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Data-driven semantic analysis for multilingual WSD and lexical selection in translation

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
An agreement measure for determining inter-annotator reliability of human judgements on affective text

HumanJudge '08 Proceedings of the Workshop on Human Judgements in Computational Linguistics
Making sense of word sense variation

DEW '09 Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions
An evaluation understudy for dialogue coherence models

SIGdial '08 Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue
Towards surveillance video search by natural language query

Proceedings of the ACM International Conference on Image and Video Retrieval
Play the language: play coreference

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
From annotator agreement to noise models

Computational Linguistics
Towards discipline-independent argumentative zoning: evidence from chemistry and computational linguistics

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Summarizing short stories

Computational Linguistics
Multimodal corpora annotation: validation methods to assess coding scheme reliability

Multimodal corpora
Interlingual annotation of parallel text corpora: A new framework for annotation and evaluation

Natural Language Engineering
Disambiguation of ambiguous biomedical terms using examples generated from the UMLS Metathesaurus

Journal of Biomedical Informatics
Classification of feedback expressions in multimodal data

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Creating and evaluating a consensus for negated and speculative words in a Swedish clinical corpus

NeSp-NLP '10 Proceedings of the Workshop on Negation and Speculation in Natural Language Processing
Emotional perception of fairy tales: achieving agreement in emotion annotation of text

CAAGET '10 Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text
Reliability and type of consumer health documents on the world wide web: an annotation study

Louhi '10 Proceedings of the NAACL HLT 2010 Second Louhi Workshop on Text and Data Mining of Health Documents
Anveshan: a framework for analysis of multiple annotators' labeling behavior

LAW IV '10 Proceedings of the Fourth Linguistic Annotation Workshop
Influence of pre-annotation on POS-tagged corpus development

LAW IV '10 Proceedings of the Fourth Linguistic Annotation Workshop
To annotate more accurately or to annotate more

LAW IV '10 Proceedings of the Fourth Linguistic Annotation Workshop
A feature type classification for therapeutic purposes: a preliminary evaluation with non-expert speakers

LAW IV '10 Proceedings of the Fourth Linguistic Annotation Workshop
An annotation schema for preposition senses in German

LAW IV '10 Proceedings of the Fourth Linguistic Annotation Workshop
PackPlay: mining semantic data in collaborative games

LAW IV '10 Proceedings of the Fourth Linguistic Annotation Workshop
The annotation scheme of the Turkish Discourse Bank and an evaluation of inconsistent annotations

LAW IV '10 Proceedings of the Fourth Linguistic Annotation Workshop
AnCora-CO: Coreferentially annotated corpora for Spanish and Catalan

Language Resources and Evaluation
Disambiguation in the biomedical domain: The role of ambiguity type

Journal of Biomedical Informatics
Finding related sentence pairs in MEDLINE

Information Retrieval
Cause identification from aviation safety incident reports via weakly supervised semantic lexicon construction

Journal of Artificial Intelligence Research
Investigating multi-label classification for human values

Proceedings of the 73rd ASIS&T Annual Meeting on Navigating Streams in an Information Ecosystem - Volume 47
Affect corpus 2.0: an extension of a corpus for actor level emotion magnitude detection

MMSys '11 Proceedings of the second annual ACM conference on Multimedia systems
Dialogue act modeling in a complex task-oriented domain

SIGDIAL '10 Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Connective-based measuring of the inter-annotator agreement in the annotation of discourse in PDT

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
A multi-dimensional annotation scheme for behaviour in dialogues

IBERAMIA'10 Proceedings of the 12th Ibero-American conference on Advances in artificial intelligence
Automatic identification of discourse markers in dialogues: An in-depth study of like and well

Computer Speech and Language
Towards a framework for developing semantic relatedness reference standards

Journal of Biomedical Informatics
Towards open ontology learning and filtering

Information Systems
Let's agree to disagree: on the evaluation of vocabulary alignment

Proceedings of the sixth international conference on Knowledge capture
The people's web meets linguistic knowledge: automatic sense alignment of Wikipedia and Wordnet

IWCS '11 Proceedings of the Ninth International Conference on Computational Semantics
Building a coreference-annotated corpus from the domain of biochemistry

BioNLP '11 Proceedings of BioNLP 2011 Workshop
On the development of the RST Spanish Treebank

LAW V '11 Proceedings of the 5th Linguistic Annotation Workshop
A scaleable automated quality assurance technique for semantic representations and proposition banks

LAW V '11 Proceedings of the 5th Linguistic Annotation Workshop
Proposal for an extension of traditional named entities: from guidelines to evaluation, an overview

LAW V '11 Proceedings of the 5th Linguistic Annotation Workshop
Annotating events, temporal expressions and relations in Italian: the It-TimeML experience for the Ita-TimeBank

LAW V '11 Proceedings of the 5th Linguistic Annotation Workshop
Annotating social acts: authority claims and alignment moves in Wikipedia talk pages

LSM '11 Proceedings of the Workshop on Languages in Social Media
Semantic disambiguation in folksonomy: a case study

NLP4DL'09/AT4DL'09 Proceedings of the 2009 international conference on Advanced language technologies for digital libraries
Exploring effective dialogue act sequences in one-on-one computer science tutoring dialogues

IUNLPBEA '11 Proceedings of the 6th Workshop on Innovative Use of NLP for Building Educational Applications
Mining methodologies from NLP publications: A case study in automatic terminology recognition

Computer Speech and Language
Disambiguation of medline abstracts using topic models

Proceedings of the ACM fifth international workshop on Data and text mining in biomedical informatics
Mapping queries to the Linking Open Data cloud: A case study using DBpedia

Web Semantics: Science, Services and Agents on the World Wide Web
What determines inter-coder agreement in manual annotations? a meta-analytic investigation

Computational Linguistics
BioNLP Shared Task 2011: bacteria gene interactions and renaming

BioNLP Shared Task '11 Proceedings of the BioNLP Shared Task 2011 Workshop
A grain of salt for the WMT manual evaluation

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Building an automated SOAP classifier for emergency department reports

Journal of Biomedical Informatics
Toward a gold standard for extractive text summarization

AI'10 Proceedings of the 23rd Canadian conference on Advances in Artificial Intelligence
Data-driven response generation in social media

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
An ontology for clinical questions about the contents of patient notes

Journal of Biomedical Informatics
Worth its weight in gold or yet another resource — a comparative study of wiktionary, openthesaurus and germanet

CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
The Rovereto Emotion and Cooperation Corpus: a new resource to investigate cooperation and emotions

Language Resources and Evaluation
Annotating abstract anaphora

Language Resources and Evaluation
Learning cause identifiers from annotator rationales

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
The role of innovation and wealth in the net neutrality debate: A content analysis of human values in congressional and FCC hearings

Journal of the American Society for Information Science and Technology
Inconsistency as a diagnostic tool in a society of intelligent agents

Artificial Intelligence in Medicine
Sentiment strength detection for the social web

Journal of the American Society for Information Science and Technology
Evaluation of clustering algorithms for word sense disambiguation

International Journal of Data Analysis Techniques and Strategies
Elliphant: improved automatic detection of zero subjects and impersonal constructions in Spanish

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Behind the article: recognizing dialog acts in Wikipedia talk pages

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Managing uncertainty in semantic tagging

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Segmentation similarity and agreement

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Topical segmentation: a study of human performance and a new measure of quality

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Expectations of word sense in parallel corpora

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Collective classification for fine-grained information status

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Scaling up WSD with automatically generated examples

BioNLP '12 Proceedings of the 2012 Workshop on Biomedical Natural Language Processing
Computing similarity between items in a digital library of cultural heritage

Journal on Computing and Cultural Heritage (JOCCH)
REX-J: Japanese referring expression corpus of situated dialogs

Language Resources and Evaluation
Better than their reputation? on the reliability of relevance assessments with students

CLEF'12 Proceedings of the Third international conference on Information Access Evaluation: multilinguality, multimodality, and visual analytics
Automatic generation of natural language nursing shift summaries in neonatal intensive care: BT-Nurse

Artificial Intelligence in Medicine
A multidimensional approach for detecting irony in Twitter

Language Resources and Evaluation
Phrase detectives: Utilizing collective intelligence for internet-scale language resource creation

ACM Transactions on Interactive Intelligent Systems (TiiS) - Special section on internet-scale human problem solving and regular papers
Implementing crowdsourcing-based relevance experimentation: an industrial perspective

Information Retrieval
Sense induction in folksonomies: a review

Artificial Intelligence Review
An open knowledge base for Italian language in a collaborative perspective

Proceedings of the 1st International Workshop on Collaborative Annotations in Shared Environment: metadata, vocabularies and techniques in the Digital Humanities
Quality estimation for machine translation: some lessons learned

Machine Translation
Pattern classification and clustering: A review of partially supervised learning approaches

Pattern Recognition Letters
Infant cry reliability: Acoustic homogeneity of spontaneous cries and pain-induced cries

Speech Communication
Framing image description as a ranking task: data, models and evaluation metrics

Journal of Artificial Intelligence Research
Exploiting Psychological Factors for Interaction Style Recognition in Spoken Conversation

IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)
Determining the difficulty of Word Sense Disambiguation

Journal of Biomedical Informatics

Quantified Score

Hi-index	0.01

Visualization

Abstract

This article is a survey of methods for measuring agreement among corpus annotators. It exposes the mathematics and underlying assumptions of agreement coefficients, covering Krippendorff's alpha as well as Scott's pi and Cohen's kappa; discusses the use of coefficients in several annotation tasks; and argues that weighted, alpha-like coefficients, traditionally less used than kappa-like measures in computational linguistics, may be more appropriate for many corpus annotation tasks---but that their use makes the interpretation of the value of the coefficient even harder.