Word sense disambiguation by learning from unlabeled data

  • Authors:
  • Seong-Bae Park;Byoung-Tak Zhang;Yung Taek Kim

  • Affiliations:
  • Seoul National University, Seoul, Korea;Seoul National University, Seoul, Korea;Seoul National University, Seoul, Korea

  • Venue:
  • ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

Most corpus-based approaches to natural language processing suffer from lack of training data. This is because acquiring a large number of labeled data is expensive. This paper describes a learning method that exploits unlabeled data to tackle data sparseness problem. The method uses committee learning to predict the labels of unlabeled data that augment the existing training data. Our experiments on word sense disambiguation show that predictive accuracy is significantly improved by using additional unlabeled data.