Cost-sensitive three-way email spam filtering

  • Authors:
  • Bing Zhou;Yiyu Yao;Jigang Luo

  • Affiliations:
  • Department of Computer Science, Sam Houston State University, Huntsville, USA 77341;Department of Computer Science, University of Regina, Regina, Canada S4S 0A2;Department of Computer Science, University of Regina, Regina, Canada S4S 0A2

  • Venue:
  • Journal of Intelligent Information Systems
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

Email spam filtering is typically treated as a binary classification problem that can be solved by machine learning algorithms. We argue that a three-way decision approach provides a more meaningful way to users for precautionary handling their incoming emails. Three email folders instead of two are produced in a three-way spam filtering system, a suspected folder is added to allow users make further examinations of suspicious emails, thereby reducing the chances of misclassification. Different from existing ternary email spam filtering systems, we focus on two issues that are less studied, that is, the computation of required thresholds to define the three email categories, and the interpretation of the cost-sensitive characteristics of spam filtering. Instead of supplying the thresholds based on intuitive understandings of the levels of tolerance for errors, we systematically calculate the thresholds based on decision-theoretic rough set model. A loss function is interpreted as the costs of making classification decisions. A decision is made for which the overall cost is minimum. Experimental results show that the new approach reduces the error rate of misclassifying a legitimate email to spam and demonstrates a better performance for the cost-sensitivity aspect.