Feature selection by fuzzy inference and its application to spam-mail filtering

  • Authors:
  • Jong-Wan Kim;Sin-Jae Kang

  • Affiliations:
  • School of Computer and Information Technology, Daegu University, Gyeonsan, Gyeongbuk, South Korea;School of Computer and Information Technology, Daegu University, Gyeonsan, Gyeongbuk, South Korea

  • Venue:
  • CIS'05 Proceedings of the 2005 international conference on Computational Intelligence and Security - Volume Part I
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a feature selection method by fuzzy inference and its application to spam-mail filtering in this work. The proposed fuzzy inference method outperforms information gain and chi squared test methods as a feature selection method in terms of error rate. In the case of junk mails, since the mail body has little text information, it provides insufficient hints to distinguish spam mails from legitimate ones. To address this problem, we follow hyperlinks contained in the email body, fetch contents of a remote web page, and extract hints from both original email body and fetched web pages. A two-phase approach is applied to filter spam mails in which definite hint is used first, and then less definite textual information is used. In our experiment, the proposed two-phase method achieved an improvement of recall by 32.4% on the average over the 1st phase or the 2nd phase only works.