MetaCost: a general method for making classifiers cost-sensitive
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Machine Learning
A Memory-Based Approach to Anti-Spam Filtering for Mailing Lists
Information Retrieval
A Comparative Study of Classification Based Personal E-mail Filtering
PADKK '00 Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Current Issues and New Applications
Applying lazy learning algorithms to tackle concept drift in spam filtering
Expert Systems with Applications: An International Journal
Information Sciences: an International Journal
Artificial immune system inspired behavior-based anti-spam filter
Soft Computing - A Fusion of Foundations, Methodologies and Applications - Web intelligence and change discovery
An empirical study of three machine learning methods for spam filtering
Knowledge-Based Systems
Workload models of spam and legitimate e-mails
Performance Evaluation
On the properties of spam-advertised URL addresses
Journal of Network and Computer Applications
A mailbox ownership based mechanism for curbing spam
Computer Communications
Behavior-based spam detection using a hybrid method of rule-based techniques and neural networks
Expert Systems with Applications: An International Journal
Support vector machines for spam categorization
IEEE Transactions on Neural Networks
Segmental parameterisation and statistical modelling of e-mail headers for spam detection
Information Sciences: an International Journal
Grindstone4Spam: An optimization toolkit for boosting e-mail classification
Journal of Systems and Software
Identifying spam e-mail based-on statistical header features and sender behavior
Proceedings of the CUBE International Information Technology Conference
Hybrid email spam detection model with negative selection algorithm and differential evolution
Engineering Applications of Artificial Intelligence
Hi-index | 12.05 |
Designing a spam-filtering system that can run efficiently on heavily burdened servers is particularly important to the widely used email service providers (ESPs) (e.g., Hotmail, Yahoo, and Gmail) who have to deal with millions of emails everyday. Two primary challenges these companies face in spam filtering are efficiency and scalability. This study is undertaken to develop an efficient and scalable spam-filtering framework for heavily burdened email servers. We propose an Intelligent Hybrid Spam-Filtering Framework (IHSFF) to detect spam by analyzing only email headers. This framework is especially suitable for giant email servers because of its efficiency and scalability. The proposed filtering system may be deployed alone or in conjunction with other filters. We extract five features from the email header, namely ''originator field'', ''destination field'', ''X-Mailer field'', ''sender server IP address'' and ''mail subject''. Email subjects are digitalized using an algorithm based on n-grams for better performance. Moreover, using real-world data from a well-known ESP in China, we employ various machine-learning algorithms to test the model. Experimental results show that the framework using the Random Forest algorithm achieves good accuracy, recall, precision, and F-measure. With the addition of MetaCost framework, the model works stably well and incurs small costs in various cost-sensitive scenarios.