Email classification with co-training

  • Authors:
  • Svetlana Kiritchenko;Stan Matwin

  • Affiliations:
  • School of Information Technology and Engineering, University of Ottawa, Ottawa, ON, Canada;School of Information Technology and Engineering, University of Ottawa, Ottawa, ON, Canada

  • Venue:
  • CASCON '01 Proceedings of the 2001 conference of the Centre for Advanced Studies on Collaborative research
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

The main problems in text classification are lack of labeled data, as well as the cost of labeling the unlabeled data. We address these problems by exploring co-training - an algorithm that uses unlabeled data along with a few labeled examples to boost the performance of a classifier. We experiment with co-training on the email domain. Our results show that the performance of co-training depends on the learning algorithm it uses. In particular, Support Vector Machines significantly outperforms Naive Bayes on email classification.