Word Sense Disambiguation by Combining Labeled Data Expansion and Semi-Supervised Learning Method

Authors:
Sanae Fujita;Akinori Fujino
Affiliations:
Nippon Telegraph and Telephone Corporation;Nippon Telegraph and Telephone Corporation
Venue:
ACM Transactions on Asian Language Information Processing (TALIP)
Year:
2013

Citing 16
Cited 0

An automatic method for generating sense tagged corpora

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone

SIGDOC '86 Proceedings of the 5th annual international conference on Systems documentation
Unsupervised word sense disambiguation rivaling supervised methods

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Word sense disambiguation using label propagation based semi-supervised learning

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Word sense disambiguation: A survey

ACM Computing Surveys (CSUR)
Exploiting semantic information for HPSG parse selection

DeepLP '07 Proceedings of the Workshop on Deep Linguistic Processing
Scaling up word sense disambiguation via parallel texts

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 3
Word sense disambiguation with semi-supervised learning

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 3
Word sense disambiguation for all words without hard labor

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
A Reexamination of MRD-Based Word Sense Disambiguation

ACM Transactions on Asian Language Information Processing (TALIP)
SemEval-2010 task: Japanese WSD

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
RALI: Automatic weighting of text window distances

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
JAIST: Clustering and classification based approaches for Japanese WSD

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
MSS: Investigating the effectiveness of domain combinations and topic features for word sense disambiguation

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
A robust semi-supervised classification method for transfer learning

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Exploring automatic word sense disambiguation with decision lists and the web

Proceedings of the COLING-2000 Workshop on Semantic Annotation and Intelligent Content

Quantified Score

Hi-index	0.00

Visualization

Abstract

Lack of labeled data is one of the severest problems facing word sense disambiguation (WSD). We overcome the problem by proposing a method that combines automatic labeled data expansion (Step 1) and semi-supervised learning (Step 2). The Step 1 and 2 methods are both effective, but their combination yields a synergistic effect. In this article, in Step 1, we automatically extract reliable labeled data from raw corpora using dictionary example sentences, even the infrequent and unseen senses (which are not likely to appear in labeled data). Next, in Step 2, we apply a semi-supervised classifier and achieve an improvement using easy-to-get unlabeled data. In this step, we also show that we can guess even unseen senses. We target a SemEval-2010 Japanese WSD task, which is a lexical sample task. Both Step 1 and Step 2 methods performed better than the best published result (76.4 %). Furthermore, the combined method achieved much higher accuracy (84.2 %). In this experiment, up to 50 % of unseen senses are classified correctly. However, the number of unseen senses are small, therefore, we delete one senses per word and apply our proposed method; the results show that the method is effective and robust even for unseen senses.