Learning word sense disambiguation in biomedical text with difference between training and test distributions

  • Authors:
  • Jeong-Woo Son;Seong-Bae Park

  • Affiliations:
  • Kyungpook National University, Daegu, South Korea;Kyungpook National University, Daegu, South Korea

  • Venue:
  • Proceedings of the third international workshop on Data and text mining in bioinformatics
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Word sense disambiguation (WSD) is a crucial issue in bio-medical text mining since the performance of diverse biomedical text mining techniques strongly depends on the senses of lexicons. Thus, it is natural to consider lexicons as the most crucial features in WSD. However, due to the diversity of lexical space, WSD methods based on machine learning techniques with lexical features suffer from the difference between distributions of training and test documents. To tackle this problem, this paper proposes support vector machines with example-wise weights. In this method, the training distribution is made coincide with the test distribution by weighting training examples according to their similarity to all test data. The experimental results show that the distribution change between training and test data is actually recognized and the proposed method which considers this change in its training phase outperforms ordinary support vector machines.