Anti-spam Filters Based on Support Vector Machines

Authors:
Chengwang Xie;Lixin Ding;Xin Du
Affiliations:
State Key Lab of Software Engineering, Wuhan University, Wuhan, China 430072;State Key Lab of Software Engineering, Wuhan University, Wuhan, China 430072;State Key Lab of Software Engineering, Wuhan University, Wuhan, China 430072 and Department of Information and Engineering, Shijiazhuang University of Economics, Shijiazhuang, China 050031
Venue:
ISICA '09 Proceedings of the 4th International Symposium on Advances in Computation and Intelligence
Year:
2009

Citing 10
Cited 0

The nature of statistical learning theory

The nature of statistical learning theory
Spam!

Communications of the ACM
Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A comparison of event models for Naive Bayes anti-spam e-mail filtering

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Combining text and heuristics for cost-sensitive spam filtering

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Estimation of Dependences Based on Empirical Data: Empirical Inference Science (Information Science and Statistics)

Estimation of Dependences Based on Empirical Data: Empirical Inference Science (Information Science and Statistics)
Support vector machines for spam categorization

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recently, spam has become an increasingly important problem. In this paper, a support vector machine (SVM) is used as the spam filter. Then a study is made of the effect of classification error rate when different subsets of corpora are used, and of the filter accuracy when SVM's with linear, polynomial, or RBF kernels is used. Also an investigation is made of the effect of the size of attribute sets. Based on the experimental results and analysis, it is concluded that SVM will be a very good alternative for building anti-spam classifiers, with consideration of a good combination of accuracy, consistency, and speed.