Multiplicity and word sense: evaluating and learning from multiply labeled word sense annotations

Authors:
Rebecca J. Passonneau;Vikas Bhardwaj;Ansaf Salleb-Aouissi;Nancy Ide
Affiliations:
Columbia University, New York, USA;Columbia University, New York, USA;Columbia University, New York, USA;Vassar College, Poughkeepsie, USA
Venue:
Language Resources and Evaluation
Year:
2012

Citing 28
Cited 0

Rule Evaluation Measures: A Unifying View

ILP '99 Proceedings of the 9th International Workshop on Inductive Logic Programming
Decomposable modeling in natural language processing

Computational Linguistics
Distinguishing systems and distinguishing senses: new evaluation methods for Word Sense Disambiguation

Natural Language Engineering
Polysemy and sense proximity in the Senseval-2 test suite

WSD '02 Proceedings of the ACL-02 workshop on Word sense disambiguation: recent successes and future directions - Volume 8
Assessing system agreement and instance difficulty in the lexical sample tasks of SENSEVAL-2

WSD '02 Proceedings of the ACL-02 workshop on Word sense disambiguation: recent successes and future directions - Volume 8
Sense discrimination with parallel corpora

WSD '02 Proceedings of the ACL-02 workshop on Word sense disambiguation: recent successes and future directions - Volume 8
Evaluating the effectiveness of ensembles of decision trees in disambiguating SENSEVAL lexical samples

WSD '02 Proceedings of the ACL-02 workshop on Word sense disambiguation: recent successes and future directions - Volume 8
Relieving the data acquisition bottleneck in word sense disambiguation

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Get another label? improving data quality and data mining using multiple, noisy labelers

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Word sense disambiguation: A survey

ACM Computing Surveys (CSUR)
Supervised learning from multiple experts: whom to trust when everyone lies a bit

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Representing words as regions in vector space

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
The reliability of anaphoric annotation, reconsidered: taking ambiguity into account

CorpusAnno '05 Proceedings of the Workshop on Frontiers in Corpus Annotations II: Pie in the Sky
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
OntoNotes: the 90% solution

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
SemEval-2007 task 17: English lexical sample, SRL and all words

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
Making sense of word sense variation

DEW '09 Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions
Investigations on word senses and word usages

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Fast, cheap, and creative: evaluating translation quality using Amazon's Mechanical Turk

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Graded word sense assignment

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
The manually annotated sub-corpus: a community resource for and by the people

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
SemEval-2010 task 14: Word sense induction & disambiguation

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
SemEval-2010 task 17: All-words word sense disambiguation on a specific domain

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
Learning From Crowds

The Journal of Machine Learning Research
Creating speech and language data with Amazon's Mechanical Turk

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Amazon Mechanical Turk for subjectivity word sense disambiguation

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Anveshan: a framework for analysis of multiple annotators' labeling behavior

LAW IV '10 Proceedings of the Fourth Linguistic Annotation Workshop
Divergence measures based on the Shannon entropy

IEEE Transactions on Information Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

Supervised machine learning methods to model word sense often rely on human labelers to provide a single, ground truth label for each word in its context. We examine issues in establishing ground truth word sense labels using a fine-grained sense inventory from WordNet. Our data consist of a sentence corpus of 1,000 sentences: 100 for each of ten moderately polysemous words. Each word was given multiple sense labels--or a multilabel--from trained and untrained annotators. The multilabels give a nuanced representation of the degree of agreement on instances. A suite of assessment metrics is used to analyze the sets of multilabels, such as comparisons of sense distributions across annotators. Our assessment indicates that the general annotation procedure is reliable, but that words differ regarding how reliably annotators can assign WordNet sense labels, independent of the number of senses. We also investigate the performance of an unsupervised machine learning method to infer ground truth labels from various combinations of labels from the trained and untrained annotators. We find tentative support for the hypothesis that performance depends on the quality of the set of multilabels, independent of the number of labelers or their training.