SDAI: An integral evaluation methodology for content-based spam filtering models

Authors:
Noemí PéRez-DíAz;David Ruano-OrdáS;Florentino Fdez-Riverola;José R. MéNdez
Affiliations:
Escuela Superior de Ingeniería Informática, Edificio Politécnico, Campus Universitario As Lagoas s/n, University of Vigo, 32004 Ourense, Spain;Escuela Superior de Ingeniería Informática, Edificio Politécnico, Campus Universitario As Lagoas s/n, University of Vigo, 32004 Ourense, Spain;Escuela Superior de Ingeniería Informática, Edificio Politécnico, Campus Universitario As Lagoas s/n, University of Vigo, 32004 Ourense, Spain;Escuela Superior de Ingeniería Informática, Edificio Politécnico, Campus Universitario As Lagoas s/n, University of Vigo, 32004 Ourense, Spain
Venue:
Expert Systems with Applications: An International Journal
Year:
2012

Citing 27
Cited 3

The effect of adding relevance information in a relevance feedback environment

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Case-based reasoning: foundational issues, methodological variations, and system approaches

AI Communications
Boosting a weak learning algorithm by majority

Information and Computation
Performance standards and evaluations in IR test collections: cluster-based retrieval models

Information Processing and Management: an International Journal
Improved Boosting Algorithms Using Confidence-rated Predictions

Machine Learning - The Eleventh Annual Conference on computational Learning Theory
Rough Sets: Theoretical Aspects of Reasoning about Data

Rough Sets: Theoretical Aspects of Reasoning about Data
Random Forests

Machine Learning
Understanding PKI: Concepts, Standards, and Deployment Considerations

Understanding PKI: Concepts, Standards, and Deployment Considerations
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Using latent semantic indexing to filter spam

Proceedings of the 2003 ACM symposium on Applied computing
Adaptive anti-spam filtering for agglutinative languages: a special case for Turkish

Pattern Recognition Letters
Applying lazy learning algorithms to tackle concept drift in spam filtering

Expert Systems with Applications: An International Journal
SpamHunting: An instance-based reasoning system for spam labelling and filtering

Decision Support Systems
Experimental perspectives on learning from imbalanced data

Proceedings of the 24th international conference on Machine learning
Rough Set Approach to Spam Filter Learning

RSEISP '07 Proceedings of the international conference on Rough Sets and Intelligent Systems Paradigms
Assessing Classification Accuracy in the Revision Stage of a CBR Spam Filtering System

ICCBR '07 Proceedings of the 7th international conference on Case-Based Reasoning: Case-Based Reasoning Research and Development
Review: A review of machine learning approaches to Spam filtering

Expert Systems with Applications: An International Journal
Review: The use of computational intelligence in intrusion detection systems: A review

Applied Soft Computing
A study of cross-validation and bootstrap for accuracy estimation and model selection

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
A case-based technique for tracking concept drift in spam filtering

Knowledge-Based Systems
Application of genetic optimized artificial immune system and neural networks in spam detection

Applied Soft Computing
Artificial immune system based on interval type-2 fuzzy set paradigm

Applied Soft Computing
A survey and experimental evaluation of image spam filtering techniques

Pattern Recognition Letters
Estimating continuous distributions in Bayesian classifiers

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
Classifying email using variable precision rough set approach

RSKT'06 Proceedings of the First international conference on Rough Sets and Knowledge Technology
A comparative performance study of feature selection methods for the anti-spam filtering domain

ICDM'06 Proceedings of the 6th Industrial Conference on Data Mining conference on Advances in Data Mining: applications in Medicine, Web Mining, Marketing, Image and Signal Mining
Support vector machines for spam categorization

IEEE Transactions on Neural Networks

Optimising anti-spam filters with evolutionary algorithms

Expert Systems with Applications: An International Journal
Effective scheduling strategies for boosting performance on rule-based spam filtering frameworks

Journal of Systems and Software
Hybrid email spam detection model with negative selection algorithm and differential evolution

Engineering Applications of Artificial Intelligence

Quantified Score

Hi-index	12.05

Visualization

Abstract

Tragedy of Commons Theory introduced by Hardin (1968) revealed how shared and limited resources get completely depleted as effect of human behaviour. By analogy, common spamming activities can be properly modelled by this solid theory and, consequently, a young Internet Security Industry has recently emerged to fight against spam. However, the massive intensification of spam deliveries during last years has led to the need of achieving a significant improvement in filter accuracy. In this context, current research efforts are mainly focussed on providing a wide variety of content-based techniques able to overcome common spam filtering inconveniencies. Although theoretical filtering evaluation is generally taken into consideration in scientific works, most of the evaluation protocols are not appropriate to correctly assess the performance of models during filter operation in real environments. In order to cover the gap between basic research and applied deployment of well-known spam filtering techniques, this work proposes a novel straightforward evaluation methodology able to rank available models using four different but complementary perspectives: static, dynamic, adaptive and internationalisation. In the present study, we applied our SDAI methodology to compare eight different well-known content-based spam filtering techniques using several established accuracy measures. Results showed the effect of the knowledge grain-size and evidenced several unexpected situations related with the behaviour of analysed models.