Coping with Distribution Change in the Same Domain Using Similarity-Based Instance Weighting

Authors:
Jeong-Woo Son;Hyun-Je Song;Seong-Bae Park;Se-Young Park
Affiliations:
Department of Computer Engineering, Kyungpook National University, Daegu, Korea 702-701;Department of Computer Engineering, Kyungpook National University, Daegu, Korea 702-701;Department of Computer Engineering, Kyungpook National University, Daegu, Korea 702-701;Department of Computer Engineering, Kyungpook National University, Daegu, Korea 702-701
Venue:
ACML '09 Proceedings of the 1st Asian Conference on Machine Learning: Advances in Machine Learning
Year:
2009

Citing 12
Cited 0

An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
A Boosted Maximum Entropy Model for Learning Text Chunking

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Learning rules and their exceptions

The Journal of Machine Learning Research
Learning and evaluating classifiers under sample selection bias

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Text chunking using regularized Winnow

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Chunking with support vector machines

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Supervised and unsupervised PCFG adaptation to novel domains

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Large Margin Methods for Structured and Interdependent Output Variables

The Journal of Machine Learning Research
An empirical study of the domain dependence of supervised word sense disambiguation systems

EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
One sense per collocation and genre/topic variations

EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
Speaker identification via support vector classifiers

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Semi-Supervised Learning

Semi-Supervised Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Lexicons are considered as the most crucial features in natural language processing (NLP), and thus often used in machine learning algorithms applied to NLP tasks. However, due to the diversity of lexical space, the machine learning algorithms with lexical features suffer from the difference between distributions of training and test data. In order to overcome the distribution change, this paper proposes support vector machines with example-wise weights. The training distribution coincides with the test distribution by weighting training examples according to their similarity to all test data. The experimental results on text chunking show that the distribution change between training and test data is actually recognized and the proposed method which considers this change in its training phase outperforms ordinary support vector machines.