Spam Detection Using Feature Selection and Parameters Optimization

  • Authors:
  • Sang Min Lee;Dong Seong Kim;Ji Ho Kim;Jong Sou Park

  • Affiliations:
  • -;-;-;-

  • Venue:
  • CISIS '10 Proceedings of the 2010 International Conference on Complex, Intelligent and Software Intensive Systems
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Spam is no more garbage but risk since it recently includes virus attachments and spyware agents which make the recipients’ system ruined, therefore, there is an emerging need for spam detection. Many spam detection techniques based on machine learning algorithms have been proposed. As the amount of spam has been increased tremendously using bulk mailing tools, spam detection techniques should deal with it. For spam detection, parameters optimization and feature selection have been proposed to reduce processing overheads with guaranteeing high detection rates. However, the previous approaches have not taken into account variable importance and optimal number of features and there are no approaches using both of them together so far. In this paper, we propose an optimal spam detection model based on Random Forests (RF) which enables parameters optimization and feature selection. We optimize two parameters of RF to maximize the detection rates. We provide the variable importance of each feature so that it is easy to eliminate the irrelevant features. Furthermore, we decide an optimal number of selected features using two methods; (i) only one parameters optimization during overall feature selection, (ii) parameters optimization in every feature elimination phase. We carry out experiments on the Spambase dataset and show the feasibility of our approach.