Partially Supervised Text Classification: Combining Labeled and Unlabeled Documents Using an EM-like Scheme

  • Authors:
  • Carsten Lanquillon

  • Affiliations:
  • -

  • Venue:
  • ECML '00 Proceedings of the 11th European Conference on Machine Learning
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

Supervised learning algorithms usually require large amounts of training data to learn reasonably accurate classifiers. Yet, in many text classification tasks, labeled training documents are expensive to obtain, while unlabeled documents are readily available in large quantities. This paper describes a general framework for extending any text learning algorithm to utilize unlabeled documents in addition to labeled document using an Expectation-Maximization-like scheme. Our instantiation of this partially supervised classification framework with a similarity-based single prototype classifier achieves encouraging results on two real-world text datasets. Classification accuracy is reduced by up to 38% when using unlabeled documents in addition to labeled documents.