Applying lazy learning algorithms to tackle concept drift in spam filtering

Authors:
F. Fdez-Riverola;E. L. Iglesias;F. Díaz;J. R. Méndez;J. M. Corchado
Affiliations:
Dept. Informática, University of Vigo, Escuela Superior de Ingeniería Informática, Edificio Politécnico, Campus Universitario As Lagoas s/n, 32004 Ourense, Spain;Dept. Informática, University of Vigo, Escuela Superior de Ingeniería Informática, Edificio Politécnico, Campus Universitario As Lagoas s/n, 32004 Ourense, Spain;Dept. Informática, University of Valladolid, Escuela Universitaria de Informática, Plaza Santa Eulalia 9-11, 40005 Segovia, Spain;Dept. Informática, University of Vigo, Escuela Superior de Ingeniería Informática, Edificio Politécnico, Campus Universitario As Lagoas s/n, 32004 Ourense, Spain;Dept. Informática y Automática, University of Salamanca, Plaza de la Merced s/n, 37008, Salamanca, Spain
Venue:
Expert Systems with Applications: An International Journal
Year:
2007

Citing 25
Cited 25

Instance-Based Learning Algorithms

Machine Learning
Learning time-varying concepts

NIPS-3 Proceedings of the 1990 conference on Advances in neural information processing systems 3
Tracking Drifting Concepts By Minimizing Disagreements

Machine Learning - Special issue on computational learning theory
Learning in the presence of concept drift and hidden contexts

Machine Learning
Applying case-based reasoning: techniques for enterprise systems

Applying case-based reasoning: techniques for enterprise systems
Context-sensitive learning methods for text categorization

ACM Transactions on Information Systems (TOIS)
Handling concept drifts in incremental learning with support vector machines

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
The impact of changing populations on classifier performance

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
BoosTexter: A Boosting-based Systemfor Text Categorization

Machine Learning - Special issue on information retrieval
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Incremental Learning from Noisy Data

Machine Learning
A Memory-Based Approach to Anti-Spam Filtering for Mailing Lists

Information Retrieval
COBBIT - A Control Procedure for COBWEB in the Presence of Concept Drift

ECML '93 Proceedings of the European Conference on Machine Learning
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Detecting Concept Drift with Support Vector Machines

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Inducing Cost-Sensitive Trees via Instance Weighting

PKDD '98 Proceedings of the Second European Symposium on Principles of Data Mining and Knowledge Discovery
Diagnosis and Decision Support

Case-Based Reasoning Technology, From Foundations to Applications
Using latent semantic indexing to filter spam

Proceedings of the 2003 ACM symposium on Applied computing
Combining text and heuristics for cost-sensitive spam filtering

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
SpamHunting: An instance-based reasoning system for spam labelling and filtering

Decision Support Systems
Learning drifting concepts: Example selection vs. example weighting

Intelligent Data Analysis
A study of cross-validation and bootstrap for accuracy estimation and model selection

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Estimating continuous distributions in Bayesian classifiers

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
Support vector machines for spam categorization

IEEE Transactions on Neural Networks
Identification and control of dynamical systems using neural networks

IEEE Transactions on Neural Networks

Non-stationary data sequence classification using online class priors estimation

Pattern Recognition
Collaborative spam filtering with heterogeneous agents

Expert Systems with Applications: An International Journal
Analyzing the Performance of Spam Filtering Methods When Dimensionality of Input Vector Changes

MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
Assessing Classification Accuracy in the Revision Stage of a CBR Spam Filtering System

ICCBR '07 Proceedings of the 7th international conference on Case-Based Reasoning: Case-Based Reasoning Research and Development
A Comparative Impact Study of Attribute Selection Techniques on Naïve Bayes Spam Filters

ICDM '08 Proceedings of the 8th industrial conference on Advances in Data Mining: Medical Applications, E-Commerce, Marketing, and Theoretical Aspects
Managing irrelevant knowledge in CBR models for unsolicited e-mail classification

Expert Systems with Applications: An International Journal
Behavior-based spam detection using a hybrid method of rule-based techniques and neural networks

Expert Systems with Applications: An International Journal
Forecasting the probability of finding oil slicks using a CBR system

Expert Systems with Applications: An International Journal
Detection of cloaked web spam by using tag-based methods

Expert Systems with Applications: An International Journal
Review: A review of machine learning approaches to Spam filtering

Expert Systems with Applications: An International Journal
MACSDE: Multi-Agent Contingency Response System for Dynamic Environments

HAIS '09 Proceedings of the 4th International Conference on Hybrid Artificial Intelligence Systems
Automatic thesaurus construction for spam filtering using revised back propagation neural network

Expert Systems with Applications: An International Journal
BioDR: Semantic indexing networks for biomedical document retrieval

Expert Systems with Applications: An International Journal
A forecasting solution to the oil spill problem based on a hybrid intelligent system

Information Sciences: an International Journal
Relaxing feature selection in spam filtering by using case-based reasoning systems

EPIA'07 Proceedings of the aritficial intelligence 13th Portuguese conference on Progress in artificial intelligence
A scalable intelligent non-content-based spam-filtering framework

Expert Systems with Applications: An International Journal
CROS: A Contingency Response multi-agent system for Oil Spills situations

Applied Soft Computing
Symbiotic filtering for spam email detection

Expert Systems with Applications: An International Journal
Facing the spammers: A very effective approach to avoid junk e-mails

Expert Systems with Applications: An International Journal
SDAI: An integral evaluation methodology for content-based spam filtering models

Expert Systems with Applications: An International Journal
Rough sets for spam filtering: Selecting appropriate decision rules for boundary e-mail classification

Applied Soft Computing
Grindstone4Spam: An optimization toolkit for boosting e-mail classification

Journal of Systems and Software
Tracking concept drift in malware families

Proceedings of the 5th ACM workshop on Security and artificial intelligence
Spam e-mail classification based on the IFWB algorithm

ACIIDS'13 Proceedings of the 5th Asian conference on Intelligent Information and Database Systems - Volume Part I
An incremental learning algorithm based on the K-associated graph for non-stationary data classification

Information Sciences: an International Journal

Quantified Score

Hi-index	12.07

Visualization

Abstract

A great amount of machine learning techniques have been applied to problems where data is collected over an extended period of time. However, the disadvantage with many real-world applications is that the distribution underlying the data is likely to change over time. In these situations, a problem that many global eager learners face is their inability to adapt to local concept drift. Concept drift in spam is particularly difficult as the spammers actively change the nature of their messages to elude spam filters. Algorithms that track concept drift must be able to identify a change in the target concept (spam or legitimate e-mails) without direct knowledge of the underlying shift in distribution. In this paper we show how a previously successful instance-based reasoning e-mail filtering model can be improved in order to better track concept drift in spam domain. Our proposal is based on the definition of two complementary techniques able to select both terms and e-mails representative of the current situation. The enhanced system is evaluated against other well-known successful lazy learning approaches in two scenarios, all within a cost-sensitive framework. The results obtained from the experiments carried out are very promising and back up the idea that instance-based reasoning systems can offer a number of advantages tackling concept drift in dynamic problems, as in the case of the anti-spam filtering domain.