Feature selection by fuzzy inference and its application to spam-mail filtering

Authors:
Jong-Wan Kim;Sin-Jae Kang
Affiliations:
School of Computer and Information Technology, Daegu University, Gyeonsan, Gyeongbuk, South Korea;School of Computer and Information Technology, Daegu University, Gyeonsan, Gyeongbuk, South Korea
Venue:
CIS'05 Proceedings of the 2005 international conference on Computational Intelligence and Security - Volume Part I
Year:
2005

Citing 5
Cited 1

Training algorithms for linear text classifiers

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Support vector machines for spam categorization

IEEE Transactions on Neural Networks

Constructing a User Preference Ontology for Anti-spam Mail Systems

CAI '07 Proceedings of the 20th conference of the Canadian Society for Computational Studies of Intelligence on Advances in Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a feature selection method by fuzzy inference and its application to spam-mail filtering in this work. The proposed fuzzy inference method outperforms information gain and chi squared test methods as a feature selection method in terms of error rate. In the case of junk mails, since the mail body has little text information, it provides insufficient hints to distinguish spam mails from legitimate ones. To address this problem, we follow hyperlinks contained in the email body, fetch contents of a remote web page, and extract hints from both original email body and fetched web pages. A two-phase approach is applied to filter spam mails in which definite hint is used first, and then less definite textual information is used. In our experiment, the proposed two-phase method achieved an improvement of recall by 32.4% on the average over the 1st phase or the 2nd phase only works.