Word sense disambiguation for all words without hard labor

Authors:
Zhi Zhong;Hwee Tou Ng
Affiliations:
Department of Computer Science, National University of Singapore, Singapore;Department of Computer Science, National University of Singapore, Singapore
Venue:
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Year:
2009

Citing 13
Cited 6

Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Automatic retrieval and clustering of similar words

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
An unsupervised method for word sense tagging using parallel corpora

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Exploiting parallel texts for word sense disambiguation: an empirical study

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Improved statistical alignment models

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Using a semantic concordance for sense identification

HLT '94 Proceedings of the workshop on Human Language Technology
An empirical evaluation of knowledge sources and learning algorithms for word sense disambiguation

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Word sense disambiguation using sense examples automatically acquired from a second language

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
OntoNotes: the 90% solution

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Scaling up word sense disambiguation via parallel texts

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 3
SemEval-2007 task 17: English lexical sample, SRL and all words

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
NUS-PT: exploiting parallel texts for word sense disambiguation in the English all-words tasks

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
English tasks: all-words and verb lexical sample

SENSEVAL '01 The Proceedings of the Second International Workshop on Evaluating Word Sense Disambiguation Systems

TreeMatch: A fully unsupervised WSD system using dependency knowledge on a specific domain

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
Correcting semantic collocation errors with L1-induced paraphrases

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
A quick tour of word sense disambiguation, induction and related approaches

SOFSEM'12 Proceedings of the 38th international conference on Current Trends in Theory and Practice of Computer Science
Word sense disambiguation improves information retrieval

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Joining forces pays off: multilingual joint word sense disambiguation

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Word Sense Disambiguation by Combining Labeled Data Expansion and Semi-Supervised Learning Method

ACM Transactions on Asian Language Information Processing (TALIP)

Quantified Score

Hi-index	0.06

Visualization

Abstract

While the most accurate word sense disambiguation systems are built using supervised learning from sense-tagged data, scaling them up to all words of a language has proved elusive, since preparing a sense-tagged corpus for all words of a language is time-consuming and human labor intensive. In this paper, we propose and implement a completely automatic approach to scale up word sense disambiguation to all words of English. Our approach relies on English-Chinese parallel corpora, English-Chinese bilingual dictionaries, and automatic methods of finding synonyms of Chinese words. No additional human sense annotations or word translations are needed. We conducted a large-scale empirical evaluation on more than 29,000 noun tokens in English texts annotated in OntoNotes 2.0, based on its coarsegrained sense inventory. The evaluation results show that our approach is able to achieve high accuracy, outperforming the first-sense baseline and coming close to a prior reported approach that requires manual human efforts to provide Chinese translations of English senses.