Semi-supervised Word Sense Disambiguation Using the Web as Corpus

  • Authors:
  • Rafael Guzmán-Cabrera;Paolo Rosso;Manuel Montes-Y-Gómez;Luis Villaseñor-Pineda;David Pinto-Avendaño

  • Affiliations:
  • FIMEE, Universidad de Guanajuato, Mexico and NLE Lab, DSIC, Universidad Politécnica de Valencia, Spain;NLE Lab, DSIC, Universidad Politécnica de Valencia, Spain;LabTL, Instituto Nacional de Astrofísica, Óptica y Electrónica, Mexico;LabTL, Instituto Nacional de Astrofísica, Óptica y Electrónica, Mexico;FCC, Benemérita Universidad Autónoma de Puebla, Mexico

  • Venue:
  • CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

As any other classification task, Word Sense Disambiguation requires a large number of training examples. These examples, which are easily obtained for most of the tasks, are particularly difficult to obtain for this case. Based on this fact, in this paper we investigate the possibility of using a Web-based approach for determining the correct sense of an ambiguous word based only in its surrounding context. In particular, we propose a semi-supervised method that is specially suited to work with just a few training examples. The method considers the automatic extraction of unlabeled examples from the Web and their iterative integration into the training data set. The experimental results, obtained over a subset of ten nouns from the SemEval lexical sample task, are encouraging. They showed that it is possible to improve the baseline accuracy of classifiers such as Naïve Bayes and SVM using some unlabeled examples extracted from the Web.