Training algorithms for linear text classifiers
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Support vector machines for spam categorization
IEEE Transactions on Neural Networks
Constructing a User Preference Ontology for Anti-spam Mail Systems
CAI '07 Proceedings of the 20th conference of the Canadian Society for Computational Studies of Intelligence on Advances in Artificial Intelligence
Hi-index | 0.00 |
We present a feature selection method by fuzzy inference and its application to spam-mail filtering in this work. The proposed fuzzy inference method outperforms information gain and chi squared test methods as a feature selection method in terms of error rate. In the case of junk mails, since the mail body has little text information, it provides insufficient hints to distinguish spam mails from legitimate ones. To address this problem, we follow hyperlinks contained in the email body, fetch contents of a remote web page, and extract hints from both original email body and fetched web pages. A two-phase approach is applied to filter spam mails in which definite hint is used first, and then less definite textual information is used. In our experiment, the proposed two-phase method achieved an improvement of recall by 32.4% on the average over the 1st phase or the 2nd phase only works.