Coping with Distribution Change in the Same Domain Using Similarity-Based Instance Weighting

  • Authors:
  • Jeong-Woo Son;Hyun-Je Song;Seong-Bae Park;Se-Young Park

  • Affiliations:
  • Department of Computer Engineering, Kyungpook National University, Daegu, Korea 702-701;Department of Computer Engineering, Kyungpook National University, Daegu, Korea 702-701;Department of Computer Engineering, Kyungpook National University, Daegu, Korea 702-701;Department of Computer Engineering, Kyungpook National University, Daegu, Korea 702-701

  • Venue:
  • ACML '09 Proceedings of the 1st Asian Conference on Machine Learning: Advances in Machine Learning
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Lexicons are considered as the most crucial features in natural language processing (NLP), and thus often used in machine learning algorithms applied to NLP tasks. However, due to the diversity of lexical space, the machine learning algorithms with lexical features suffer from the difference between distributions of training and test data. In order to overcome the distribution change, this paper proposes support vector machines with example-wise weights. The training distribution coincides with the test distribution by weighting training examples according to their similarity to all test data. The experimental results on text chunking show that the distribution change between training and test data is actually recognized and the proposed method which considers this change in its training phase outperforms ordinary support vector machines.