A case-based technique for tracking concept drift in spam filtering

Authors:
Sarah Jane Delany;Pádraig Cunningham;Alexey Tsymbal;Lorcan Coyle
Affiliations:
Dublin Institute of Technology, Kevin Street, Dublin 8, Ireland;Trinity College Dublin, College Green, Dublin 2, Ireland;Trinity College Dublin, College Green, Dublin 2, Ireland;Trinity College Dublin, College Green, Dublin 2, Ireland
Venue:
Knowledge-Based Systems
Year:
2005

Citing 18
Cited 30

Instance-Based Learning Algorithms

Machine Learning
C4.5: programs for machine learning

C4.5: programs for machine learning
Learning in the presence of concept drift and hidden contexts

Machine Learning
Tolerating Concept and Sampling Shift in Lazy Learning UsingPrediction Error Context Switching

Artificial Intelligence Review - Special issue on lazy learning
Extracting Hidden Context

Machine Learning - Special issue on context sensitivity and concept drift
Adaptive information filtering: detecting changes in text streams

Proceedings of the eighth international conference on Information and knowledge management
Mining time-changing data streams

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
A streaming ensemble algorithm (SEA) for large-scale classification

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Incremental Learning from Noisy Data

Machine Learning
A Memory-Based Approach to Anti-Spam Filtering for Mailing Lists

Information Retrieval
Effective Learning in Dynamic Environments by Explicit Context Tracking

ECML '93 Proceedings of the European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Diagnosis and Decision Support

Case-Based Reasoning Technology, From Foundations to Applications
Dynamic Weighted Majority: A New Ensemble Method for Tracking Concept Drift

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Using latent semantic indexing to filter spam

Proceedings of the 2003 ACM symposium on Applied computing
Mining concept-drifting data streams using ensemble classifiers

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning drifting concepts: Example selection vs. example weighting

Intelligent Data Analysis
Support vector machines for spam categorization

IEEE Transactions on Neural Networks

Cross-domain video concept detection using adaptive svms

Proceedings of the 15th international conference on Multimedia
Dynamic integration of classifiers for handling concept drift

Information Fusion
Textual case-based reasoning for spam filtering: a comparison of feature-based and feature-free approaches

Artificial Intelligence Review
Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts

The Journal of Machine Learning Research
Catching the Drift: Using Feature-Free Case-Based Reasoning for Spam Filtering

ICCBR '07 Proceedings of the 7th international conference on Case-Based Reasoning: Case-Based Reasoning Research and Development
Adaptive Spam Detection Inspired by a Cross-Regulation Model of Immune Dynamics: A Study of Concept Drift

ICARIS '08 Proceedings of the 7th international conference on Artificial Immune Systems
Managing irrelevant knowledge in CBR models for unsolicited e-mail classification

Expert Systems with Applications: An International Journal
Using the self organizing map for clustering of text documents

Expert Systems with Applications: An International Journal
Review: A review of machine learning approaches to Spam filtering

Expert Systems with Applications: An International Journal
Combining neural networks and semantic feature space for email classification

Knowledge-Based Systems
ECUE: A Spam Filter that Uses Machine Learning to Track Concept Drift

Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
Learning, detecting, understanding, and predicting concept changes

IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Dynamic financial distress prediction using instance selection for the disposal of concept drift

Expert Systems with Applications: An International Journal
Handling drifts and shifts in on-line data streams with evolving fuzzy systems

Applied Soft Computing
EGAL: exploration guided active learning for TCBR

ICCBR'10 Proceedings of the 18th international conference on Case-Based Reasoning Research and Development
Detecting change via competence model

ICCBR'10 Proceedings of the 18th international conference on Case-Based Reasoning Research and Development
Modified blame-based noise reduction for concept drift

AIKED'12 Proceedings of the 11th WSEAS international conference on Artificial Intelligence, Knowledge Engineering and Data Bases
SDAI: An integral evaluation methodology for content-based spam filtering models

Expert Systems with Applications: An International Journal
Rough sets for spam filtering: Selecting appropriate decision rules for boundary e-mail classification

Applied Soft Computing
Drift mining in data: A framework for addressing drift in classification

Computational Statistics & Data Analysis
Grindstone4Spam: An optimization toolkit for boosting e-mail classification

Journal of Systems and Software
Tracking concept drift in malware families

Proceedings of the 5th ACM workshop on Security and artificial intelligence
Developing methods and heuristics with low time complexities for filtering spam messages

NLDB'07 Proceedings of the 12th international conference on Applications of Natural Language to Information Systems
Sublinear algorithms for penalized logistic regression in massive datasets

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
An efficient adversarial learning strategy for constructing robust classification boundaries

AI'12 Proceedings of the 25th Australasian joint conference on Advances in Artificial Intelligence
Spam e-mail classification based on the IFWB algorithm

ACIIDS'13 Proceedings of the 5th Asian conference on Intelligent Information and Database Systems - Volume Part I
RCD: A recurring concept drift framework

Pattern Recognition Letters
Which work-item updates need your response?

Proceedings of the 10th Working Conference on Mining Software Repositories
A survey on concept drift adaptation

ACM Computing Surveys (CSUR)
Concept drift detection via competence models

Artificial Intelligence

Quantified Score

Hi-index	0.01

Visualization

Abstract

Spam filtering is a particularly challenging machine learning task as the data distribution and concept being learned changes over time. It exhibits a particularly awkward form of concept drift as the change is driven by spammers wishing to circumvent spam filters. In this paper we show that lazy learning techniques are appropriate for such dynamically changing contexts. We present a case-based system for spam filtering that can learn dynamically. We evaluate its performance as the case-base is updated with new cases. We also explore the benefit of periodically redoing the feature selection process to bring new features into play. Our evaluation shows that these two levels of model update are effective in tracking concept drift.