Word Clustering for Collocation-Based Word Sense Disambiguation

Authors:
Peng Jin;Xu Sun;Yunfang Wu;Shiwen Yu
Affiliations:
Department of Computer Science and Technology, Institute of Computational Linguistics, Peking University, 100871, Beijing, China;Department of Computer Science and Technology, Institute of Computational Linguistics, Peking University, 100871, Beijing, China;Department of Computer Science and Technology, Institute of Computational Linguistics, Peking University, 100871, Beijing, China;Department of Computer Science and Technology, Institute of Computational Linguistics, Peking University, 100871, Beijing, China
Venue:
CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Year:
2009

Citing 12
Cited 0

Class-based n-gram models of natural language

Computational Linguistics
Word sense disambiguation using a second language monolingual corpus

Computational Linguistics
Algorithms for bigram and trigram word clustering

Speech Communication
Word sense disambiguation in information retrieval revisited

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Word clustering and disambiguation based on co-occurrence data

Natural Language Engineering
A simple approach to building ensembles of Naive Bayesian classifiers for word sense disambiguation

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
An efficient method for determining bilingual word classes

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Information retrieval using word senses: root sense tagging approach

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Translation selection through source word sense disambiguation and target word selection

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Simple features for Chinese word sense disambiguation

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Maximum entropy models for word sense disambiguation

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
One sense per collocation

HLT '93 Proceedings of the workshop on Human Language Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

The main disadvantage of collocation-based word sense disambiguation is that the recall is low, with relatively high precision. How to improve the recall without decrease the precision? In this paper, we investigate a word-class approach to extend the collocation list which is constructed from the manually sense-tagged corpus. But the word classes are obtained from a larger scale corpus which is not sense tagged. The experiment results have shown that the F-measure is improved to 71% compared to 54% of the baseline system where the word-class is not considered, although the precision decreases slightly. Further study discovers the relationship between the F-measure and the number of word-class trained from the various sizes of corpus.