Combining Labeled and Unlabeled Data for Text Classification with a Large Number of Categories

  • Authors:
  • Rayid Ghani

  • Affiliations:
  • -

  • Venue:
  • ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

We develop a framework to incorporate unlabeled data in the Error-Correcting Output Coding (ECOC)setup by decomposing multiclass problems into multiple binary problems and then use Co-Training to learn the individual binary classification problems. We show that our method isespecially useful for classification tasks involving a large number of categories where Co-training doesn't perform very well by itself and when combined with ECOC, outperforms several other algorithms that combine labeled and unlabeled data for text classification in terms of accuracy, precision-recall tradeoff, and efficiency.