Email classification with co-training

  • Authors:
  • Svetlana Kiritchenko;Stan Matwin

  • Affiliations:
  • University of Ottawa, Ottawa, ON, Canada;University of Ottawa, Ottawa, ON, Canada

  • Venue:
  • Proceedings of the 2011 Conference of the Center for Advanced Studies on Collaborative Research
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The main problems in text classification are lack of labeled data, as well as the cost of labeling the unlabeled data. We address these problems by exploring co-training - an algorithm that uses unlabeled data along with a few labeled examples to boost the performance of a classifier. We experiment with co-training on the email domain. Our results show that the performance of co-training depends on the learning algorithm it uses. In particular, Support Vector Machines significantly outperforms Naive Bayes on email classification.