A novel refinement approach for text categorization

  • Authors:
  • Songbo Tan;Xueqi Cheng;Moustafa M. Ghanem;Bin Wang;Hongbo Xu

  • Affiliations:
  • ICT, Beijing, CHINA & Chinese Academy of Sciences, CHINA;ICT, Beijing, CHINA;Imperial College London, London, UK;ICT, Beijing, CHINA;ICT, Beijing, CHINA

  • Venue:
  • Proceedings of the 14th ACM international conference on Information and knowledge management
  • Year:
  • 2005

Quantified Score

Hi-index 0.01

Visualization

Abstract

In this paper we present a novel strategy, DragPushing, for improving the performance of text classifiers. The strategy is generic and takes advantage of training errors to successively refine the classification model of a base classifier. We describe how it is applied to generate two new classification algorithms; a Refined Centroid Classifier and a Refined Naïve Bayes Classifier. We present an extensive experimental evaluation of both algorithms on three English collections and one Chinese corpus. The results indicate that in each case, the refined classifiers achieve significant performance improvement over the base classifiers used. Furthermore, the performance of the Refined Centroid Classifier implemented is comparable, if not better, to that of state-of-the-art support vector machine (SVM)-based classifier, but offers a much lower computational cost.