Review: A review of machine learning approaches to Spam filtering

Authors:
Thiago S. Guzella;Walmir M. Caminhas
Affiliations:
Department of Electrical Engineering, Federal University of Minas Gerais, Ave. Antonio Carlos, 6627, Belo Horizonte (MG) 31270-910, Brazil;Department of Electrical Engineering, Federal University of Minas Gerais, Ave. Antonio Carlos, 6627, Belo Horizonte (MG) 31270-910, Brazil
Venue:
Expert Systems with Applications: An International Journal
Year:
2009

Citing 60
Cited 23

Case-based reasoning: foundational issues, methodological variations, and system approaches

AI Communications
An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Artificial Immune Systems: A New Computational Intelligence Paradigm

Artificial Immune Systems: A New Computational Intelligence Paradigm
Text Categorization with Support Vector Machines. How to Represent Texts in Input Space?

Machine Learning
A Memory-Based Approach to Anti-Spam Filtering for Mailing Lists

Information Retrieval
Sparse bayesian learning and the relevance vector machine

The Journal of Machine Learning Research
A Neural Network Based Approach to Automated E-Mail Classification

WI '03 Proceedings of the 2003 IEEE/WIC International Conference on Web Intelligence
PEBL: Web Page Classification without Negative Examples

IEEE Transactions on Knowledge and Data Engineering
"In vivo" spam filtering: a challenge problem for KDD

ACM SIGKDD Explorations Newsletter
An evaluation of statistical spam filtering techniques

ACM Transactions on Asian Language Information Processing (TALIP)
Adaptive anti-spam filtering for agglutinative languages: a special case for Turkish

Pattern Recognition Letters
A Novel Kernel Method for Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
A comparison of event models for Naive Bayes anti-spam e-mail filtering

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Image Analysis for Efficient Categorization of Image-based Spam E-mail

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
An Assessment of Case-Based Reasoning for Spam Filtering

Artificial Intelligence Review
2005 Special Issue: Efficient information theoretic strategies for classifier combination, feature extraction and performance evaluation in improving false positives and false negatives for spam e-mail filtering

Neural Networks - 2005 Special issue: IJCNN 2005
An introduction to ROC analysis

Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition
A suffix tree approach to anti-spam email filtering

Machine Learning
Lazy Associative Classification for Content-based Spam Detection

LA-WEB '06 Proceedings of the Fourth Latin American Web Congress
Using online linear classifiers to filter spam emails

Pattern Analysis & Applications
Neural Networks: A Comprehensive Foundation (3rd Edition)

Neural Networks: A Comprehensive Foundation (3rd Edition)
Spam and the ongoing battle for the inbox

Communications of the ACM - Spam and the ongoing battle for the inbox
Applying lazy learning algorithms to tackle concept drift in spam filtering

Expert Systems with Applications: An International Journal
Corrective feedback and persistent learning for information extraction

Artificial Intelligence
An HMM for detecting spam mail

Expert Systems with Applications: An International Journal
Learning to classify e-mail

Information Sciences: an International Journal
Artificial immune system inspired behavior-based anti-spam filter

Soft Computing - A Fusion of Foundations, Methodologies and Applications - Web intelligence and change discovery
An empirical study of three machine learning methods for spam filtering

Knowledge-Based Systems
SpamHunting: An instance-based reasoning system for spam labelling and filtering

Decision Support Systems
Workload models of spam and legitimate e-mails

Performance Evaluation
Online supervised spam filter evaluation

ACM Transactions on Information Systems (TOIS)
Spam Filtering Using Statistical Data Compression Models

The Journal of Machine Learning Research
Spam Filtering Based On The Analysis Of Text Information Embedded Into Images

The Journal of Machine Learning Research
A Stochastic Algorithm for Feature Selection in Pattern Recognition

The Journal of Machine Learning Research
Supervised clustering of streaming data for email batch detection

Proceedings of the 24th international conference on Machine learning
Relaxed online SVMs for spam filtering

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Image Spam Filtering Using Visual Information

ICIAP '07 Proceedings of the 14th International Conference on Image Analysis and Processing
Intelligent Detection Approaches for Spam

ICNC '07 Proceedings of the Third International Conference on Natural Computation - Volume 03
Spam Filtering With Dynamically Updated URL Statistics

IEEE Security and Privacy
Harnessing the Expertise of 70,000 Human Editors: Knowledge-Based Feature Generation for Text Categorization

The Journal of Machine Learning Research
Time-efficient spam e-mail filtering using n-gram models

Pattern Recognition Letters
Textual case-based reasoning for spam filtering: a comparison of feature-based and feature-free approaches

Artificial Intelligence Review
An incremental cluster-based approach to spam filtering

Expert Systems with Applications: An International Journal
Competing for consumer's attention

Automatica (Journal of IFAC)
Deep Belief Networks for Spam Filtering

ICTAI '07 Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence - Volume 02
Empirical likelihood confidence intervals for differences between two datasets with missing data

Pattern Recognition Letters
A comparative study for content-based dynamic spam classification using four machine learning algorithms

Knowledge-Based Systems
An evaluation of Naive Bayes variants in content-based learning for spam filtering

Intelligent Data Analysis
Effective spam filtering: A single-class learning and ensemble approach

Decision Support Systems
Collaborative spam filtering with heterogeneous agents

Expert Systems with Applications: An International Journal
Adaptive Spam Detection Inspired by a Cross-Regulation Model of Immune Dynamics: A Study of Concept Drift

ICARIS '08 Proceedings of the 7th international conference on Artificial Immune Systems
Managing irrelevant knowledge in CBR models for unsolicited e-mail classification

Expert Systems with Applications: An International Journal
Behavior-based spam detection using a hybrid method of rule-based techniques and neural networks

Expert Systems with Applications: An International Journal
Detecting spam blogs: a machine learning approach

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
A case-based technique for tracking concept drift in spam filtering

Knowledge-Based Systems
Developing an immunity to spam

GECCO'03 Proceedings of the 2003 international conference on Genetic and evolutionary computation: PartI
An immunological filter for spam

ICARIS'06 Proceedings of the 5th international conference on Artificial Immune Systems
Immunity from spam: an analysis of an artificial immune system for junk email detection

ICARIS'05 Proceedings of the 4th international conference on Artificial Immune Systems
Support vector machines for spam categorization

IEEE Transactions on Neural Networks

Study on Ensemble Classification Methods towards Spam Filtering

ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Computing a Comprehensible Model for Spam Filtering

DS '09 Proceedings of the 12th International Conference on Discovery Science
Improving the performance of Naive Bayes multinomial in e-mail foldering by introducing distribution-based balance of datasets

Expert Systems with Applications: An International Journal
Using biased discriminant analysis for email filtering

KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part I
Symbiotic filtering for spam email detection

Expert Systems with Applications: An International Journal
A new feature selection algorithm based on binomial hypothesis testing for spam filtering

Knowledge-Based Systems
Anomaly Detection in Dynamic Systems Using Weak Estimators

ACM Transactions on Internet Technology (TOIT)
A danger theory inspired learning model and its application to spam detection

ICSI'11 Proceedings of the Second international conference on Advances in swarm intelligence - Volume Part I
PCA document reconstruction for email classification

Computational Statistics & Data Analysis
A survey of emerging approaches to spam filtering

ACM Computing Surveys (CSUR)
Facing the spammers: A very effective approach to avoid junk e-mails

Expert Systems with Applications: An International Journal
Segmental parameterisation and statistical modelling of e-mail headers for spam detection

Information Sciences: an International Journal
Review: SMS spam filtering: Methods and data

Expert Systems with Applications: An International Journal
SDAI: An integral evaluation methodology for content-based spam filtering models

Expert Systems with Applications: An International Journal
Rough sets for spam filtering: Selecting appropriate decision rules for boundary e-mail classification

Applied Soft Computing
Grindstone4Spam: An optimization toolkit for boosting e-mail classification

Journal of Systems and Software
A novel probabilistic feature selection method for text classification

Knowledge-Based Systems
Towards effective algorithms for intelligent defense systems

CSS'12 Proceedings of the 4th international conference on Cyberspace Safety and Security
Detecting malicious tweets in trending topics using a statistical analysis of language

Expert Systems with Applications: An International Journal
On online high-dimensional spherical data clustering and feature selection

Engineering Applications of Artificial Intelligence
Quite a mess in my cookie jar!: leveraging machine learning to protect web authentication

Proceedings of the 23rd international conference on World wide web
Hybrid email spam detection model with negative selection algorithm and differential evolution

Engineering Applications of Artificial Intelligence
Learning to filter spam emails: An ensemble learning approach

International Journal of Hybrid Intelligent Systems

Quantified Score

Hi-index	12.06

Visualization

Abstract

In this paper, we present a comprehensive review of recent developments in the application of machine learning algorithms to Spam filtering, focusing on both textual- and image-based approaches. Instead of considering Spam filtering as a standard classification problem, we highlight the importance of considering specific characteristics of the problem, especially concept drift, in designing new filters. Two particularly important aspects not widely recognized in the literature are discussed: the difficulties in updating a classifier based on the bag-of-words representation and a major difference between two early naive Bayes models. Overall, we conclude that while important advancements have been made in the last years, several aspects remain to be explored, especially under more realistic evaluation settings.