Identifying spam e-mail based-on statistical header features and sender behavior

  • Authors:
  • Aziz Qaroush;Ismail M. Khater;Mahdi Washaha

  • Affiliations:
  • Birzeit University, Birzeit, West Bank, Palestine;Birzeit University, Birzeit, West Bank, Palestine;Birzeit University, Birzeit, West Bank, Palestine

  • Venue:
  • Proceedings of the CUBE International Information Technology Conference
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Email Spam filtering still a sophisticated and challenging problem as long as spammers continue developing new methods and techniques that are being used in their campaigns to defeat and confuse email spam filtering process. Moreover, utilizing email header information imposing additional challenges in classifying emails because the header information can be easily spoofed by spammers. Also, in recent years, spam has become a major problem at social, economical, political, and organizational levels because it decreases the employee productivity and causes traffic congestions in networks. In this paper, we present a powerful and useful email header features by utilizing the header session messages based on publicly datasets. Then, we apply many machine learning-based classifiers on the extracted header features to show the power of the extracted header features in filtering spam and ham messages by evaluating and comparing classifiers performance. In experiment stage, we apply the following classifiers: Random Forest (RF), C4.5 Decision Tree (J48), Voting Feature Intervals (VFI), Random Tree (RT), REPTree (REPT), Bayesian Network (BN), and Naïve Bayes (NB). The experimental results show that the RF classifier has the best performance with an accuracy, precision, recall, F-measure of 99.27%, 99.40%, 99.50%, and 99.50% when all mentioned features are used included the trust feature.