Anveshan: a framework for analysis of multiple annotators' labeling behavior

Authors:
Vikas Bhardwaj;Rebecca J. Passonneau;Ansaf Salleb-Aouissi;Nancy Ide
Affiliations:
Columbia University, New York, NY;Columbia University, New York, NY;Columbia University, New York, NY;Vassar College, Poughkeepsie, NY
Venue:
LAW IV '10 Proceedings of the Fourth Linguistic Annotation Workshop
Year:
2010

Citing 14
Cited 2

Polysemy and sense proximity in the Senseval-2 test suite

WSD '02 Proceedings of the ACL-02 workshop on Word sense disambiguation: recent successes and future directions - Volume 8
Assessing system agreement and instance difficulty in the lexical sample tasks of SENSEVAL-2

WSD '02 Proceedings of the ACL-02 workshop on Word sense disambiguation: recent successes and future directions - Volume 8
Sense discrimination with parallel corpora

WSD '02 Proceedings of the ACL-02 workshop on Word sense disambiguation: recent successes and future directions - Volume 8
Evaluating the effectiveness of ensembles of decision trees in disambiguating SENSEVAL lexical samples

WSD '02 Proceedings of the ACL-02 workshop on Word sense disambiguation: recent successes and future directions - Volume 8
The Proposition Bank: An Annotated Corpus of Semantic Roles

Computational Linguistics
Relieving the data acquisition bottleneck in word sense disambiguation

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Emotions from text: machine learning for text-based emotion prediction

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Get another label? improving data quality and data mining using multiple, noisy labelers

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Reliability measurement without limits

Computational Linguistics
Inter-coder agreement for computational linguistics

Computational Linguistics
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Evaluating classifiers by means of test data with noisy labels

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Contrasting the interaction structure of an email and a telephone corpus: a machine learning approach to annotation of dialogue function units

SIGDIAL '09 Proceedings of the SIGDIAL 2009 Conference: The 10th Annual Meeting of the Special Interest Group on Discourse and Dialogue
The manually annotated sub-corpus: a community resource for and by the people

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers

Multiplicity and word sense: evaluating and learning from multiply labeled word sense annotations

Language Resources and Evaluation
Automatic correction of annotation boundaries in activity datasets by class separation maximization

Proceedings of the 2013 ACM conference on Pervasive and ubiquitous computing adjunct publication

Quantified Score

Hi-index	0.00

Visualization

Abstract

Manual annotation of natural language to capture linguistic information is essential for NLP tasks involving supervised machine learning of semantic knowledge. Judgements of meaning can be more or less subjective, in which case instead of a single correct label, the labels assigned might vary among annotators based on the annotators' knowledge, age, gender, intuitions, background, and so on. We introduce a framework "Anveshan," where we investigate annotator behavior to find outliers, cluster annotators by behavior, and identify confusable labels. We also investigate the effectiveness of using trained annotators versus a larger number of untrained annotators on a word sense annotation task. The annotation data comes from a word sense disambiguation task for polysemous words, annotated by both trained annotators and untrained annotators from Amazon's Mechanical turk. Our results show that Anveshan is effective in uncovering patterns in annotator behavior, and we also show that trained annotators are superior to a larger number of untrained annotators for this task.