A comparative study for content-based dynamic spam classification using four machine learning algorithms

  • Authors:
  • Bo Yu;Zong-ben Xu

  • Affiliations:
  • School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China;Institute for Information and System Science, School of Science, Xi'an Jiaotong University, Xi'an 710049, China

  • Venue:
  • Knowledge-Based Systems
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The growth of email users has resulted in the dramatic increasing of the spam emails during the past few years. In this paper, four machine learning algorithms, which are Naive Bayesian (NB), neural network (NN), support vector machine (SVM) and relevance vector machine (RVM), are proposed for spam classification. An empirical evaluation for them on the benchmark spam filtering corpora is presented. The experiments are performed based on different training set size and extracted feature size. Experimental results show that NN classifier is unsuitable for using alone as a spam rejection tool. Generally, the performances of SVM and RVM classifiers are obviously superior to NB classifier. Compared with SVM, RVM is shown to provide the similar classification result with less relevance vectors and much faster testing time. Despite the slower learning procedure, RVM is more suitable than SVM for spam classification in terms of the applications that require low complexity.