Unsupervised WSD by finding the predominant sense using context as a dynamic thesaurus

Authors:
Javier Tejada-Cárcamo;Hiram Calvo;Alexander Gelbukh;Kazuo Hara
Affiliations:
San Pablo Catholic University, Arequipa, Peru;Center for Computing Research, National Polytechnic Institute, Mexico City, Mexico and Nara Institute of Science and Technology, Takayama, Ikoma, Nara, Japan;Center for Computing Research, National Polytechnic Institute, Mexico City, Mexico;Nara Institute of Science and Technology, Takayama, Ikoma, Nara, Japan
Venue:
Journal of Computer Science and Technology
Year:
2010

Citing 7
Cited 0

Dimensions of meaning

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Automatic retrieval and clustering of similar words

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Finding predominant word senses in untagged text

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Improving Unsupervised WSD with a Dynamic Thesaurus

TSD '08 Proceedings of the 11th international conference on Text, Speech and Dialogue
WordNet: similarity - measuring the relatedness of concepts

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Using information content to evaluate semantic similarity in a taxonomy

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
Using measures of semantic relatedness for word sense disambiguation

CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present and analyze an unsupervised method for Word Sense Disambiguation (WSD). Our work is based on the method presented by McCarthy et al. in 2004 for finding the predominant sense of each word in the entire corpus. Their maximization algorithm allows weighted terms (similar words) from a distributional thesaurus to accumulate a score for each ambiguous word sense, i.e., the sense with the highest score is chosen based on votes from a weighted list of terms related to the ambiguous word. This list is obtained using the distributional similarity method proposed by Lin Dekang to obtain a thesaurus. In the method of McCarthy et al., every occurrence of the ambiguous word uses the same thesaurus, regardless of the context where the ambiguous word occurs. Our method accounts for the context of a word when determining the sense of an ambiguous word by building the list of distributed similar words based on the syntactic context of the ambiguous word. We obtain a top precision of 77.54% of accuracy versus 67.10% of the original method tested on SemCor. We also analyze the effect of the number of weighted terms in the tasks of finding the Most Frecuent Sense (MFS) and WSD, and experiment with several corpora for building the Word Space Model.