Attention, intentions, and the structure of discourse
Computational Linguistics
Assessing agreement on classification tasks: the kappa statistic
Computational Linguistics
Statistical Models for Text Segmentation
Machine Learning - Special issue on natural language learning
Foundations of statistical natural language processing
Foundations of statistical natural language processing
A critique and improvement of an evaluation metric for text segmentation
Computational Linguistics
Summarizing scientific articles: experiments with relevance and rhetorical status
Computational Linguistics - Summarization
Segmenting Conversations by Topic, Initiative, and Style
Information Retrieval Techniques for Speech Applications [this book is based on the workshop “Information Retrieval Techniques for Speech Applications”, held as part of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in New Orleans, USA, in September 2001].
The Trains 91 Dialogues
Corelex: systematic polysemy and underspecification
Corelex: systematic polysemy and underspecification
Topic segmentation: algorithms and applications
Topic segmentation: algorithms and applications
Dialogue systems as conversational partners: applying conversation acts theory to natural language generation for task-oriented mixed-initiative spoken dialogue
Dialogue act modeling for automatic tagging and recognition of conversational speech
Computational Linguistics
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
The reliability of a dialogue structure coding scheme
Computational Linguistics
TextTiling: segmenting text into multi-paragraph subtopic passages
Computational Linguistics
Discourse segmentation by human and automated means
Computational Linguistics
A corpus-based investigation of definite description use
Computational Linguistics
Recognizing subjectivity: a case study in manual tagging
Natural Language Engineering
Experiments on sentence boundary detection
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
An annotation scheme for discourse-level argumentation in research articles
EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
95% Replicability for manual word sense tagging
EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
An empirical investigation of proposals in collaborative dialogues
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Intention-based segmentation: human reliability and correlation with linguistic cues
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
CLAWS4: the tagging of the British National Corpus
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
The kappa statistic: a second look
Computational Linguistics
Information extraction and evaluation
MUC5 '93 Proceedings of the 5th conference on Message understanding
A model-theoretic coreference scoring scheme
MUC6 '95 Proceedings of the 6th conference on Message understanding
Resolving pronominal reference to abstract entities
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Evaluating Discourse and Dialogue Coding Schemes
Computational Linguistics
Abstract anaphora resolution in Danish
SIGDIAL '00 Proceedings of the 1st SIGdial workshop on Discourse and dialogue - Volume 10
Comparing several aspects of human-computer and human-human dialogues
SIGDIAL '01 Proceedings of the Second SIGdial Workshop on Discourse and Dialogue - Volume 16
Definitional, personal, and mechanical constraints on part of speech annotation performance
Natural Language Engineering
Reliability measurement without limits
Computational Linguistics
The reliability of anaphoric annotation, reconsidered: taking ambiguity into account
CorpusAnno '05 Proceedings of the Workshop on Frontiers in Corpus Annotations II: Pie in the Sky
Discourse annotation and semantic annotation in the GNOME corpus
DiscAnnotation '04 Proceedings of the 2004 ACL Workshop on Discourse Annotation
HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Measuring annotator agreement in a complex hierarchical dialogue act annotation scheme
SigDIAL '06 Proceedings of the 7th SIGdial Workshop on Discourse and Dialogue
Relevance assessment: are judges exchangeable and does it matter
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Reliability measurement without limits
Computational Linguistics
Constructing corpora for the development and evaluation of paraphrase systems
Computational Linguistics
CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
Semantic Clustering for a Functional Text Classification Task
CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
Semi-formal Evaluation of Conversational Characters
Languages: From Formal to Natural
Disambiguation of biomedical abbreviations
BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
Exploring two biomedical text genres for disease recognition
BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Data-driven semantic analysis for multilingual WSD and lexical selection in translation
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
HumanJudge '08 Proceedings of the Workshop on Human Judgements in Computational Linguistics
Making sense of word sense variation
DEW '09 Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions
An evaluation understudy for dialogue coherence models
SIGdial '08 Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue
Towards surveillance video search by natural language query
Proceedings of the ACM International Conference on Image and Video Retrieval
Play the language: play coreference
ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
From annotator agreement to noise models
Computational Linguistics
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Computational Linguistics
Interlingual annotation of parallel text corpora: A new framework for annotation and evaluation
Natural Language Engineering
Disambiguation of ambiguous biomedical terms using examples generated from the UMLS Metathesaurus
Journal of Biomedical Informatics
Classification of feedback expressions in multimodal data
ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Creating and evaluating a consensus for negated and speculative words in a Swedish clinical corpus
NeSp-NLP '10 Proceedings of the Workshop on Negation and Speculation in Natural Language Processing
Emotional perception of fairy tales: achieving agreement in emotion annotation of text
CAAGET '10 Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text
Reliability and type of consumer health documents on the world wide web: an annotation study
Louhi '10 Proceedings of the NAACL HLT 2010 Second Louhi Workshop on Text and Data Mining of Health Documents
Anveshan: a framework for analysis of multiple annotators' labeling behavior
LAW IV '10 Proceedings of the Fourth Linguistic Annotation Workshop
Influence of pre-annotation on POS-tagged corpus development
LAW IV '10 Proceedings of the Fourth Linguistic Annotation Workshop
To annotate more accurately or to annotate more
LAW IV '10 Proceedings of the Fourth Linguistic Annotation Workshop
LAW IV '10 Proceedings of the Fourth Linguistic Annotation Workshop
An annotation schema for preposition senses in German
LAW IV '10 Proceedings of the Fourth Linguistic Annotation Workshop
PackPlay: mining semantic data in collaborative games
LAW IV '10 Proceedings of the Fourth Linguistic Annotation Workshop
The annotation scheme of the Turkish Discourse Bank and an evaluation of inconsistent annotations
LAW IV '10 Proceedings of the Fourth Linguistic Annotation Workshop
AnCora-CO: Coreferentially annotated corpora for Spanish and Catalan
Language Resources and Evaluation
Disambiguation in the biomedical domain: The role of ambiguity type
Journal of Biomedical Informatics
Finding related sentence pairs in MEDLINE
Information Retrieval
Journal of Artificial Intelligence Research
Investigating multi-label classification for human values
Proceedings of the 73rd ASIS&T Annual Meeting on Navigating Streams in an Information Ecosystem - Volume 47
Affect corpus 2.0: an extension of a corpus for actor level emotion magnitude detection
MMSys '11 Proceedings of the second annual ACM conference on Multimedia systems
Dialogue act modeling in a complex task-oriented domain
SIGDIAL '10 Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Connective-based measuring of the inter-annotator agreement in the annotation of discourse in PDT
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
A multi-dimensional annotation scheme for behaviour in dialogues
IBERAMIA'10 Proceedings of the 12th Ibero-American conference on Advances in artificial intelligence
Automatic identification of discourse markers in dialogues: An in-depth study of like and well
Computer Speech and Language
Towards a framework for developing semantic relatedness reference standards
Journal of Biomedical Informatics
Towards open ontology learning and filtering
Information Systems
Let's agree to disagree: on the evaluation of vocabulary alignment
Proceedings of the sixth international conference on Knowledge capture
The people's web meets linguistic knowledge: automatic sense alignment of Wikipedia and Wordnet
IWCS '11 Proceedings of the Ninth International Conference on Computational Semantics
Building a coreference-annotated corpus from the domain of biochemistry
BioNLP '11 Proceedings of BioNLP 2011 Workshop
On the development of the RST Spanish Treebank
LAW V '11 Proceedings of the 5th Linguistic Annotation Workshop
A scaleable automated quality assurance technique for semantic representations and proposition banks
LAW V '11 Proceedings of the 5th Linguistic Annotation Workshop
Proposal for an extension of traditional named entities: from guidelines to evaluation, an overview
LAW V '11 Proceedings of the 5th Linguistic Annotation Workshop
LAW V '11 Proceedings of the 5th Linguistic Annotation Workshop
Annotating social acts: authority claims and alignment moves in Wikipedia talk pages
LSM '11 Proceedings of the Workshop on Languages in Social Media
Semantic disambiguation in folksonomy: a case study
NLP4DL'09/AT4DL'09 Proceedings of the 2009 international conference on Advanced language technologies for digital libraries
Exploring effective dialogue act sequences in one-on-one computer science tutoring dialogues
IUNLPBEA '11 Proceedings of the 6th Workshop on Innovative Use of NLP for Building Educational Applications
Mining methodologies from NLP publications: A case study in automatic terminology recognition
Computer Speech and Language
Disambiguation of medline abstracts using topic models
Proceedings of the ACM fifth international workshop on Data and text mining in biomedical informatics
Mapping queries to the Linking Open Data cloud: A case study using DBpedia
Web Semantics: Science, Services and Agents on the World Wide Web
What determines inter-coder agreement in manual annotations? a meta-analytic investigation
Computational Linguistics
BioNLP Shared Task 2011: bacteria gene interactions and renaming
BioNLP Shared Task '11 Proceedings of the BioNLP Shared Task 2011 Workshop
A grain of salt for the WMT manual evaluation
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Building an automated SOAP classifier for emergency department reports
Journal of Biomedical Informatics
Toward a gold standard for extractive text summarization
AI'10 Proceedings of the 23rd Canadian conference on Advances in Artificial Intelligence
Data-driven response generation in social media
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
An ontology for clinical questions about the contents of patient notes
Journal of Biomedical Informatics
CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
The Rovereto Emotion and Cooperation Corpus: a new resource to investigate cooperation and emotions
Language Resources and Evaluation
Language Resources and Evaluation
Learning cause identifiers from annotator rationales
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Journal of the American Society for Information Science and Technology
Inconsistency as a diagnostic tool in a society of intelligent agents
Artificial Intelligence in Medicine
Sentiment strength detection for the social web
Journal of the American Society for Information Science and Technology
Evaluation of clustering algorithms for word sense disambiguation
International Journal of Data Analysis Techniques and Strategies
Elliphant: improved automatic detection of zero subjects and impersonal constructions in Spanish
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Behind the article: recognizing dialog acts in Wikipedia talk pages
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Managing uncertainty in semantic tagging
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Segmentation similarity and agreement
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Topical segmentation: a study of human performance and a new measure of quality
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Expectations of word sense in parallel corpora
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Collective classification for fine-grained information status
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Scaling up WSD with automatically generated examples
BioNLP '12 Proceedings of the 2012 Workshop on Biomedical Natural Language Processing
Computing similarity between items in a digital library of cultural heritage
Journal on Computing and Cultural Heritage (JOCCH)
REX-J: Japanese referring expression corpus of situated dialogs
Language Resources and Evaluation
Better than their reputation? on the reliability of relevance assessments with students
CLEF'12 Proceedings of the Third international conference on Information Access Evaluation: multilinguality, multimodality, and visual analytics
Artificial Intelligence in Medicine
A multidimensional approach for detecting irony in Twitter
Language Resources and Evaluation
Phrase detectives: Utilizing collective intelligence for internet-scale language resource creation
ACM Transactions on Interactive Intelligent Systems (TiiS) - Special section on internet-scale human problem solving and regular papers
Implementing crowdsourcing-based relevance experimentation: an industrial perspective
Information Retrieval
Sense induction in folksonomies: a review
Artificial Intelligence Review
An open knowledge base for Italian language in a collaborative perspective
Proceedings of the 1st International Workshop on Collaborative Annotations in Shared Environment: metadata, vocabularies and techniques in the Digital Humanities
Quality estimation for machine translation: some lessons learned
Machine Translation
Pattern classification and clustering: A review of partially supervised learning approaches
Pattern Recognition Letters
Framing image description as a ranking task: data, models and evaluation metrics
Journal of Artificial Intelligence Research
Exploiting Psychological Factors for Interaction Style Recognition in Spoken Conversation
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)
Determining the difficulty of Word Sense Disambiguation
Journal of Biomedical Informatics
Hi-index | 0.01 |
This article is a survey of methods for measuring agreement among corpus annotators. It exposes the mathematics and underlying assumptions of agreement coefficients, covering Krippendorff's alpha as well as Scott's pi and Cohen's kappa; discusses the use of coefficients in several annotation tasks; and argues that weighted, alpha-like coefficients, traditionally less used than kappa-like measures in computational linguistics, may be more appropriate for many corpus annotation tasks---but that their use makes the interpretation of the value of the coefficient even harder.