Text categorization using SVMs with rocchio ensemble for internet information classification

  • Authors:
  • Xin Xu;Bofeng Zhang;Qiuxi Zhong

  • Affiliations:
  • School of Computer, National University of Defense Technology, Changsha, P.R. China;School of Computer, National University of Defense Technology, Changsha, P.R. China;School of Computer, National University of Defense Technology, Changsha, P.R. China

  • Venue:
  • ICCNMC'05 Proceedings of the Third international conference on Networking and Mobile Computing
  • Year:
  • 2005

Quantified Score

Hi-index 0.02

Visualization

Abstract

In this paper, a novel text categorization method based on multi-class Support Vector Machines (SVMs) with Rocchio ensemble is proposed for Internet information classification and filtering. The multi-class SVM classifier with Rocchio ensemble has a novel cascaded architecture in which a Rocchio linear classifier processes all the data and only selected part of the data is re-processed by the multi-class SVM classifier. The data selection for SVM is based on the validation results of the Rocchio classifier so that only data classes with lower precision is processed by the SVM classifier. The whole cascaded ensemble classifier takes advantages of the multi-class SVM as well as the Rocchio classifier. In one aspect, the small computational cost or fast processing speed of Rocchio is suitable for large-scale web information classification and filtering applications such as spam mail filtering at network gateways. On the other hand, the good generalization ability of multi-class SVMs can be employed to improve Rocchio's precision further. The whole ensemble classifier can be viewed as an efficient approach to compromising processing speed and precision of different classifiers. Experimental results on real web text data illustrate the effectiveness of the proposed method.