Learning with unlabeled data for text categorization using bootstrapping and feature projection techniques

  • Authors:
  • Youngjoong Ko;Jungyun Seo

  • Affiliations:
  • Sogang Univ., Mapo-gu, Seoul, Korea;Sogang Univ., Mapo-gu Seoul, Korea

  • Venue:
  • ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

A wide range of supervised learning algorithms has been applied to Text Categorization. However, the supervised learning approaches have some problems. One of them is that they require a large, often prohibitive, number of labeled training documents for accurate learning. Generally, acquiring class labels for training data is costly, while gathering a large quantity of unlabeled data is cheap. We here propose a new automatic text categorization method for learning from only unlabeled data using a bootstrapping framework and a feature projection technique. From results of our experiments, our method showed reasonably comparable performance compared with a supervised method. If our method is used in a text categorization task, building text categorization systems will become significantly faster and less expensive.