An effective spam filter based on a combined support vector machine approach

Authors:
Mumtaz M. Al-Mukhtar;Yasmine M. Tabra
Affiliations:
Department of Internet Engineering, College of Information Engineering, Al-Nahrain University, P.O. Box 64074, Aljadria, Baghdad, Iraq.;Department of Internet Engineering, College of Information Engineering, Al-Nahrain University, P.O. Box 64074, Aljadria, Baghdad, Iraq
Venue:
International Journal of Internet Technology and Secured Transactions
Year:
2012

Citing 6
Cited 0

An Empirical Performance Comparison of Machine Learning Methods for Spam E-Mail Categorization

HIS '04 Proceedings of the Fourth International Conference on Hybrid Intelligent Systems
A comparison of event models for Naive Bayes anti-spam e-mail filtering

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
An Innovative Spam Filtering Model Based on Support Vector Machine

CIMCA '05 Proceedings of the International Conference on Computational Intelligence for Modelling, Control and Automation and International Conference on Intelligent Agents, Web Technologies and Internet Commerce Vol-2 (CIMCA-IAWTIC'06) - Volume 02
A comparative study for content-based dynamic spam classification using four machine learning algorithms

Knowledge-Based Systems
Hybrid Spam E-mail Filtering

CICSYN '09 Proceedings of the 2009 First International Conference on Computational Intelligence, Communication Systems and Networks
Statistical Rules for Thai Spam Detection

ICFN '10 Proceedings of the 2010 Second International Conference on Future Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

The volume of mass unsolicited e-mail, often known as spam, has recently increased enormously and has become a serious threat to not only internet but also to society. It is challenging to develop spam filters that can effectively eliminate the increasing volume of unwanted e-mails automatically. The present work presents a combination of support vector machine classifier for non-linear data (using an eligible kernel function) with appropriate data pre-processing as a spam filter. Data pre-processing is a vital part of text classification where the objective is to generate feature vectors usable by SVM kernels. The pre-processing steps include HTML removal, HTML replacement, de-obfuscation and stop-word-remover. The results obtained using the pre-processing level showed an improvement in the classification level. The estimated training and classification time for different document sizes indicate that the adopted method is practical and computationally efficient. Experimental results show that the approach can enhance the filtering performance effectively.