Unsupervised word sense disambiguation for Korean through the acyclic weighted digraph using corpus and dictionary

Authors:
Yeohoon Yoon;Choong-Nyoung Seon;Songwook Lee;Jungyun Seo
Affiliations:
Department of Computer Science, Sogang University, 1 Sinsu-dong, Mapo-gu, Seoul, Korea;Department of Computer Science, Sogang University, 1 Sinsu-dong, Mapo-gu, Seoul, Korea;Division of Computer Engineering, Dongseo University, San 69-1 Jurye-dong, Sasang-gu, Busan 617-716, Korea;Department of Computer Science, Sogang University, 1 Sinsu-dong, Mapo-gu, Seoul, Korea
Venue:
Information Processing and Management: an International Journal
Year:
2006

Citing 5
Cited 0

Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone

SIGDOC '86 Proceedings of the 5th annual international conference on Systems documentation
An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet

CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
Word-sense disambiguation using statistical methods

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Unsupervised word sense disambiguation rivaling supervised methods

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Word-sense disambiguation using statistical models of Roget's categories trained on large corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

Word sense disambiguation (WSD) is meant to assign the most appropriate sense to a polysemous word according to its context. We present a method for automatic WSD using only two resources: a raw text corpus and a machine-readable dictionary (MRD). The system learns the similarity matrix between word pairs from the unlabeled corpus, and it uses the vector representations of sense definitions from MRD, which are derived based on the similarity matrix. In order to disambiguate all occurrences of polysemous words in a sentence, the system separately constructs the acyclic weighted digraph (AWD) for every occurrence of polysemous words in a sentence. The AWD is structured based on consideration of the senses of context words which occur with a target word in a sentence. After building the AWD per each polysemous word, we can search the optimal path of the AWD using the Viterbi algorithm. We assign the most appropriate sense to the target word in sentences with the sense on the optimal path in the AWD. By experiments, our system shows 76.4% accuracy for the semantically ambiguous Korean words.