Semi-supervised Word Sense Disambiguation Using the Web as Corpus

Authors:
Rafael Guzmán-Cabrera;Paolo Rosso;Manuel Montes-Y-Gómez;Luis Villaseñor-Pineda;David Pinto-Avendaño
Affiliations:
FIMEE, Universidad de Guanajuato, Mexico and NLE Lab, DSIC, Universidad Politécnica de Valencia, Spain;NLE Lab, DSIC, Universidad Politécnica de Valencia, Spain;LabTL, Instituto Nacional de Astrofísica, Óptica y Electrónica, Mexico;LabTL, Instituto Nacional de Astrofísica, Óptica y Electrónica, Mexico;FCC, Benemérita Universidad Autónoma de Puebla, Mexico
Venue:
CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
Year:
2009

Citing 12
Cited 1

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Enhancing Supervised Learning with Unlabeled Data

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Introduction to the special issue on the web as corpus

Computational Linguistics - Special issue on web as corpus
Introduction to the special issue on word sense disambiguation: the state of the art

Computational Linguistics - Special issue on word sense disambiguation
Unsupervised word sense disambiguation rivaling supervised methods

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
An empirical evaluation of knowledge sources and learning algorithms for word sense disambiguation

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Word sense disambiguation using label propagation based semi-supervised learning

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Semi-supervised training of a kernel PCA-based model for word sense disambiguation

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
A conceptual density-based approach for the disambiguation of toponyms

International Journal of Geographical Information Science
Using the Web as corpus for self-training text categorization

Information Retrieval
Word sense disambiguation with semi-supervised learning

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 3
PNNL: a supervised maximum entropy approach to word sense disambiguation

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations

Web query disambiguation using PageRank

Journal of the American Society for Information Science and Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

As any other classification task, Word Sense Disambiguation requires a large number of training examples. These examples, which are easily obtained for most of the tasks, are particularly difficult to obtain for this case. Based on this fact, in this paper we investigate the possibility of using a Web-based approach for determining the correct sense of an ambiguous word based only in its surrounding context. In particular, we propose a semi-supervised method that is specially suited to work with just a few training examples. The method considers the automatic extraction of unlabeled examples from the Web and their iterative integration into the training data set. The experimental results, obtained over a subset of ten nouns from the SemEval lexical sample task, are encouraging. They showed that it is possible to improve the baseline accuracy of classifiers such as Naïve Bayes and SVM using some unlabeled examples extracted from the Web.