Relieving the data acquisition bottleneck in word sense disambiguation

Authors:
Mona Diab
Affiliations:
Stanford University
Venue:
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Year:
2004

Citing 8
Cited 13

Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Reducing multiclass to binary: a unifying approach for margin classifiers

The Journal of Machine Learning Research
Unsupervised word sense disambiguation rivaling supervised methods

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Word sense disambiguation within a multilingual framework

Word sense disambiguation within a multilingual framework
A method for word sense disambiguation of unrestricted text

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
An unsupervised method for word sense tagging using parallel corpora

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
An unsupervised approach for bootstrapping Arabic sense tagging

Semitic '04 Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages
Supervised sense tagging using support vector machines

SENSEVAL '01 The Proceedings of the Second International Workshop on Evaluating Word Sense Disambiguation Systems

Word sense disambiguation vs. statistical machine translation

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
An equivalent pseudoword solution to Chinese word sense disambiguation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
A New Decision Rule for Statistical Word Sense Disambiguation

ICIC '08 Proceedings of the 4th international conference on Intelligent Computing: Advanced Intelligent Computing Theories and Applications - with Aspects of Theoretical and Methodological Issues
A fast method for parallel document identification

NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
Scaling up word sense disambiguation via parallel texts

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 3
Making sense of word sense variation

DEW '09 Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions
Combining knowledge-based methods and supervised learning for effective Italian word sense disambiguation

STEP '08 Proceedings of the 2008 Conference on Semantics in Text Processing
Acquiring applicable common sense knowledge from the Web

UMSLLS '09 Proceedings of the Workshop on Unsupervised and Minimally Supervised Learning of Lexical Semantics
Extracting glosses to disambiguate word senses

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Anveshan: a framework for analysis of multiple annotators' labeling behavior

LAW IV '10 Proceedings of the Fourth Linguistic Annotation Workshop
A quick tour of word sense disambiguation, induction and related approaches

SOFSEM'12 Proceedings of the 38th international conference on Current Trends in Theory and Practice of Computer Science
Managing uncertainty in semantic tagging

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Multiplicity and word sense: evaluating and learning from multiply labeled word sense annotations

Language Resources and Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Supervised learning methods for WSD yield better performance than unsupervised methods. Yet the availability of clean training data for the former is still a severe challenge. In this paper, we present an unsupervised bootstrapping approach for WSD which exploits huge amounts of automatically generated noisy data for training within a supervised learning framework. The method is evaluated using the 29 nouns in the English Lexical Sample task of SENSEVAL 2. Our algorithm does as well as supervised algorithms on 31% of this test set, which is an improvement of 11% (absolute) over state-of-the-art bootstrapping WSD algorithms. We identify seven different factors that impact the performance of our system.