Rule Evaluation Measures: A Unifying View
ILP '99 Proceedings of the 9th International Workshop on Inductive Logic Programming
Decomposable modeling in natural language processing
Computational Linguistics
Natural Language Engineering
Polysemy and sense proximity in the Senseval-2 test suite
WSD '02 Proceedings of the ACL-02 workshop on Word sense disambiguation: recent successes and future directions - Volume 8
Assessing system agreement and instance difficulty in the lexical sample tasks of SENSEVAL-2
WSD '02 Proceedings of the ACL-02 workshop on Word sense disambiguation: recent successes and future directions - Volume 8
Sense discrimination with parallel corpora
WSD '02 Proceedings of the ACL-02 workshop on Word sense disambiguation: recent successes and future directions - Volume 8
WSD '02 Proceedings of the ACL-02 workshop on Word sense disambiguation: recent successes and future directions - Volume 8
Relieving the data acquisition bottleneck in word sense disambiguation
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Get another label? improving data quality and data mining using multiple, noisy labelers
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Word sense disambiguation: A survey
ACM Computing Surveys (CSUR)
Supervised learning from multiple experts: whom to trust when everyone lies a bit
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Representing words as regions in vector space
CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
The reliability of anaphoric annotation, reconsidered: taking ambiguity into account
CorpusAnno '05 Proceedings of the Workshop on Frontiers in Corpus Annotations II: Pie in the Sky
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
SemEval-2007 task 17: English lexical sample, SRL and all words
SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
Making sense of word sense variation
DEW '09 Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions
Investigations on word senses and word usages
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Fast, cheap, and creative: evaluating translation quality using Amazon's Mechanical Turk
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
The manually annotated sub-corpus: a community resource for and by the people
ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
SemEval-2010 task 14: Word sense induction & disambiguation
SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
SemEval-2010 task 17: All-words word sense disambiguation on a specific domain
SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
The Journal of Machine Learning Research
Creating speech and language data with Amazon's Mechanical Turk
CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Amazon Mechanical Turk for subjectivity word sense disambiguation
CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Anveshan: a framework for analysis of multiple annotators' labeling behavior
LAW IV '10 Proceedings of the Fourth Linguistic Annotation Workshop
Divergence measures based on the Shannon entropy
IEEE Transactions on Information Theory
Hi-index | 0.00 |
Supervised machine learning methods to model word sense often rely on human labelers to provide a single, ground truth label for each word in its context. We examine issues in establishing ground truth word sense labels using a fine-grained sense inventory from WordNet. Our data consist of a sentence corpus of 1,000 sentences: 100 for each of ten moderately polysemous words. Each word was given multiple sense labels--or a multilabel--from trained and untrained annotators. The multilabels give a nuanced representation of the degree of agreement on instances. A suite of assessment metrics is used to analyze the sets of multilabels, such as comparisons of sense distributions across annotators. Our assessment indicates that the general annotation procedure is reliable, but that words differ regarding how reliably annotators can assign WordNet sense labels, independent of the number of senses. We also investigate the performance of an unsupervised machine learning method to infer ground truth labels from various combinations of labels from the trained and untrained annotators. We find tentative support for the hypothesis that performance depends on the quality of the set of multilabels, independent of the number of labelers or their training.