Learning word sense disambiguation in biomedical text with difference between training and test distributions

Authors:
Jeong-Woo Son;Seong-Bae Park
Affiliations:
Kyungpook National University, Daegu, South Korea;Kyungpook National University, Daegu, South Korea
Venue:
Proceedings of the third international workshop on Data and text mining in bioinformatics
Year:
2009

Citing 9
Cited 1

A Boosted Maximum Entropy Model for Learning Text Chunking

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Learning rules and their exceptions

The Journal of Machine Learning Research
Learning and evaluating classifiers under sample selection bias

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Chunking with support vector machines

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Supervised and unsupervised PCFG adaptation to novel domains

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Large Margin Methods for Structured and Interdependent Output Variables

The Journal of Machine Learning Research
An empirical study of the domain dependence of supervised word sense disambiguation systems

EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
One sense per collocation and genre/topic variations

EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
Speaker identification via support vector classifiers

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01

Unsupervised word sense disambiguation in biomedical texts with co-occurrence network and graph kernel

DTMBIO '10 Proceedings of the ACM fourth international workshop on Data and text mining in biomedical informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Word sense disambiguation (WSD) is a crucial issue in bio-medical text mining since the performance of diverse biomedical text mining techniques strongly depends on the senses of lexicons. Thus, it is natural to consider lexicons as the most crucial features in WSD. However, due to the diversity of lexical space, WSD methods based on machine learning techniques with lexical features suffer from the difference between distributions of training and test documents. To tackle this problem, this paper proposes support vector machines with example-wise weights. In this method, the training distribution is made coincide with the test distribution by weighting training examples according to their similarity to all test data. The experimental results show that the distribution change between training and test data is actually recognized and the proposed method which considers this change in its training phase outperforms ordinary support vector machines.