An Empirical Performance Comparison of Machine Learning Methods for Spam E-Mail Categorization
HIS '04 Proceedings of the Fourth International Conference on Hybrid Intelligent Systems
A comparison of event models for Naive Bayes anti-spam e-mail filtering
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
An Innovative Spam Filtering Model Based on Support Vector Machine
CIMCA '05 Proceedings of the International Conference on Computational Intelligence for Modelling, Control and Automation and International Conference on Intelligent Agents, Web Technologies and Internet Commerce Vol-2 (CIMCA-IAWTIC'06) - Volume 02
CICSYN '09 Proceedings of the 2009 First International Conference on Computational Intelligence, Communication Systems and Networks
Statistical Rules for Thai Spam Detection
ICFN '10 Proceedings of the 2010 Second International Conference on Future Networks
Hi-index | 0.00 |
The volume of mass unsolicited e-mail, often known as spam, has recently increased enormously and has become a serious threat to not only internet but also to society. It is challenging to develop spam filters that can effectively eliminate the increasing volume of unwanted e-mails automatically. The present work presents a combination of support vector machine classifier for non-linear data (using an eligible kernel function) with appropriate data pre-processing as a spam filter. Data pre-processing is a vital part of text classification where the objective is to generate feature vectors usable by SVM kernels. The pre-processing steps include HTML removal, HTML replacement, de-obfuscation and stop-word-remover. The results obtained using the pre-processing level showed an improvement in the classification level. The estimated training and classification time for different document sizes indicate that the adopted method is practical and computationally efficient. Experimental results show that the approach can enhance the filtering performance effectively.