Semi-supervised training of a kernel PCA-based model for word sense disambiguation

  • Authors:
  • Weifeng Su;Marine Carpuat;Dekai Wu

  • Affiliations:
  • University of Science and Technology, Clear Water Bay, Hong Kong;University of Science and Technology, Clear Water Bay, Hong Kong;University of Science and Technology, Clear Water Bay, Hong Kong

  • Venue:
  • COLING '04 Proceedings of the 20th international conference on Computational Linguistics
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we introduce a new semi-supervised learning model for word sense disambiguation based on Kernel Principal Component Analysis (KPCA), with experiments showing that it can further improve accuracy over supervised KPCA models that have achieved WSD accuracy superior to the best published individual models. Although empirical results with supervised KPCA models demonstrate significantly better accuracy compared to the state-of-the-art achieved by either naïve Bayes or maximum entropy models on Senseval-2 data, we identify specific sparse data conditions under which supervised KPCA models deteriorate to essentially a most-frequent-sense predictor. We discuss the potential of KPCA for leveraging unannotated data for partially-unsupervised training to address these issues, leading to a composite model that combines both the supervised and semi-supervised models.