An incremental cluster-based approach to spam filtering

  • Authors:
  • Wen-Feng Hsiao;Te-Min Chang

  • Affiliations:
  • Department of Information Management, National Pingtung Institute of Commerce, Taiwan;Department of Information Management, National Sun Yat-sen University, Taiwan

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2008

Quantified Score

Hi-index 12.06

Visualization

Abstract

As email becomes a popular means for communication over the Internet, the problem of receiving unsolicited and undesired emails, called spam or junk mails, severely arises. To filter spam from legitimate emails, automatic classification approaches using text mining techniques are proposed. This kind of approaches, however, often suffers from low recall rate due to the natures of spam, skewed class distributions and concept drift. This research is thus to propose an appropriate classification approach to alleviating the problems of skewed class distributions and drifting concepts. A cluster-based classification method, called ICBC, is developed accordingly. ICBC contains two phases. In the first phase, it clusters emails in each given class into several groups, and an equal number of features (keywords) are extracted from each group to manifest the features in the minority class. In the second phase, we capacitate ICBC with an incremental learning mechanism that can adapt itself to accommodate the changes of the environment in a fast and low-cost manner. Three experiments are conducted to evaluate the performance of ICBC. The results show that ICBC can effectively deal with the issues of skewed and changing class distributions, and its incremental learning can also reduce the cost of re-training. The feasibility of the proposed approach is thus justified.