Word Clustering for Collocation-Based Word Sense Disambiguation

  • Authors:
  • Peng Jin;Xu Sun;Yunfang Wu;Shiwen Yu

  • Affiliations:
  • Department of Computer Science and Technology, Institute of Computational Linguistics, Peking University, 100871, Beijing, China;Department of Computer Science and Technology, Institute of Computational Linguistics, Peking University, 100871, Beijing, China;Department of Computer Science and Technology, Institute of Computational Linguistics, Peking University, 100871, Beijing, China;Department of Computer Science and Technology, Institute of Computational Linguistics, Peking University, 100871, Beijing, China

  • Venue:
  • CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The main disadvantage of collocation-based word sense disambiguation is that the recall is low, with relatively high precision. How to improve the recall without decrease the precision? In this paper, we investigate a word-class approach to extend the collocation list which is constructed from the manually sense-tagged corpus. But the word classes are obtained from a larger scale corpus which is not sense tagged. The experiment results have shown that the F-measure is improved to 71% compared to 54% of the baseline system where the word-class is not considered, although the precision decreases slightly. Further study discovers the relationship between the F-measure and the number of word-class trained from the various sizes of corpus.