Boosting a weak learning algorithm by majority
Information and Computation
Machine Learning
Text filtering by boosting naive Bayes classifiers
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Ensembling neural networks: many could be better than all
Artificial Intelligence
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Transductive Inference for Text Classification using Support Vector Machines
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Automatic text categorization by unsupervised learning
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
In Defense of One-Vs-All Classification
The Journal of Machine Learning Research
A comparison of event models for Naive Bayes anti-spam e-mail filtering
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Solving multiclass learning problems via error-correcting output codes
Journal of Artificial Intelligence Research
Hi-index | 0.02 |
In this paper, a novel text categorization method based on multi-class Support Vector Machines (SVMs) with Rocchio ensemble is proposed for Internet information classification and filtering. The multi-class SVM classifier with Rocchio ensemble has a novel cascaded architecture in which a Rocchio linear classifier processes all the data and only selected part of the data is re-processed by the multi-class SVM classifier. The data selection for SVM is based on the validation results of the Rocchio classifier so that only data classes with lower precision is processed by the SVM classifier. The whole cascaded ensemble classifier takes advantages of the multi-class SVM as well as the Rocchio classifier. In one aspect, the small computational cost or fast processing speed of Rocchio is suitable for large-scale web information classification and filtering applications such as spam mail filtering at network gateways. On the other hand, the good generalization ability of multi-class SVMs can be employed to improve Rocchio's precision further. The whole ensemble classifier can be viewed as an efficient approach to compromising processing speed and precision of different classifiers. Experimental results on real web text data illustrate the effectiveness of the proposed method.