Improving word sense disambiguation by pseudo-samples

  • Authors:
  • Xiaojie Wang;Yuji Matsumoto

  • Affiliations:
  • Graduate School of Information Science, Nara Institute of Science and Technology, Nara, Japan;Graduate School of Information Science, Nara Institute of Science and Technology, Nara, Japan

  • Venue:
  • IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data sparseness is a major problem in word sense disambiguation. Automatic sample acquisition and smoothing are two ways that have been explored to alleviate the influence of data sparseness. In this paper, we consider a combination of these two methods. Firstly, we propose a pattern-based way to acquire pseudo samples, and then we estimate conditional probabilities for variables by combining pseudo data set with sense tagged data set. By using the combinational estimation, we build an appropriate leverage between the two different data sets, which is vital to achieve the best performance. Experiments show that our approach brings significant improvement for Chinese word sense disambiguation.