A Memory-Based Approach to Anti-Spam Filtering for Mailing Lists

Authors:
Georgios Sakkis;Ion Androutsopoulos;Georgios Paliouras;Vangelis Karkaletsis;Constantine D. Spyropoulos;Panagiotis Stamatopoulos
Affiliations:
Institute of Informatics and Telecommunications, National Centre for Scientific Research (NCSR) “Demokritos”, GR-153 10 Ag. Paraskevi, Athens, Greece. gsakis@iit.demokritos.gr< ...;Department of Informatics, Athens University of Economics and Business, Patission 76, GR-104 34, Athens, Greece. ion@aueb.gr;Institute of Informatics and Telecommunications, National Centre for Scientific Research (NCSR) “Demokritos”, GR-153 10 Ag. Paraskevi, Athens, Greece. paliourg@iit.demokritos.g ...;Institute of Informatics and Telecommunications, National Centre for Scientific Research (NCSR) “Demokritos”, GR-153 10 Ag. Paraskevi, Athens, Greece. vangelis@iit.demokritos.g ...;Institute of Informatics and Telecommunications, National Centre for Scientific Research (NCSR) “Demokritos”, GR-153 10 Ag. Paraskevi, Athens, Greece. costass@iit.demokritos.gr ...;Department of Informatics, University of Athens, TYPA Buildings, Panepistimiopolis, GR-157 71, Athens, Greece. T.Stamatopoulos@di.uoa.gr
Venue:
Information Retrieval
Year:
2003

Citing 0
Cited 47

Adversarial classification

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
An Assessment of Case-Based Reasoning for Spam Filtering

Artificial Intelligence Review
Socio-technical defense against voice spamming

ACM Transactions on Autonomous and Adaptive Systems (TAAS)
Applying lazy learning algorithms to tackle concept drift in spam filtering

Expert Systems with Applications: An International Journal
SpamHunting: An instance-based reasoning system for spam labelling and filtering

Decision Support Systems
Spam Filtering Using Statistical Data Compression Models

The Journal of Machine Learning Research
Detecting spam in VoIP networks

SRUTI'05 Proceedings of the Steps to Reducing Unwanted Traffic on the Internet on Steps to Reducing Unwanted Traffic on the Internet Workshop
Time-efficient spam e-mail filtering using n-gram models

Pattern Recognition Letters
Hoodwinking spam email filters

CEA'07 Proceedings of the 2007 annual Conference on International Conference on Computer Engineering and Applications
An evaluation of Naive Bayes variants in content-based learning for spam filtering

Intelligent Data Analysis
Effective spam filtering: A single-class learning and ensemble approach

Decision Support Systems
Nuisance level of a voice call

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Visualizing and Evaluating Complexity of Textual Case Bases

ECCBR '08 Proceedings of the 9th European conference on Advances in Case-Based Reasoning
Searching for Interacting Features for Spam Filtering

ISNN '08 Proceedings of the 5th international symposium on Neural Networks: Advances in Neural Networks
Email Spam Filtering: A Systematic Review

Foundations and Trends in Information Retrieval
Behavior-based spam detection using a hybrid method of rule-based techniques and neural networks

Expert Systems with Applications: An International Journal
Using phrases as features in email classification

Journal of Systems and Software
Review: A review of machine learning approaches to Spam filtering

Expert Systems with Applications: An International Journal
ECUE: A Spam Filter that Uses Machine Learning to Track Concept Drift

Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
A survey of learning-based techniques of email spam filtering

Artificial Intelligence Review
Automatic thesaurus construction for spam filtering using revised back propagation neural network

Expert Systems with Applications: An International Journal
Discovering classification rules for email spam filtering with an ant colony optimization algorithm

CEC'09 Proceedings of the Eleventh conference on Congress on Evolutionary Computation
Commercial Internet filters: Perils and opportunities

Decision Support Systems
A case-based technique for tracking concept drift in spam filtering

Knowledge-Based Systems
Detecting image based spam email

ICHIT'06 Proceedings of the 1st international conference on Advances in hybrid information technology
A scalable intelligent non-content-based spam-filtering framework

Expert Systems with Applications: An International Journal
Using GMDH-based networks for improved spam detection and email feature analysis

Applied Soft Computing
An intelligent technique to detect file formats and e-mail spam

Proceedings of the 1st Amrita ACM-W Celebration on Women in Computing in India
BSPNN: boosted subspace probabilistic neural network for email security

Artificial Intelligence Review
Collective classification for spam filtering

CISIS'11 Proceedings of the 4th international conference on Computational intelligence in security for information systems
Enhancing scalability in anomaly-based email spam filtering

Proceedings of the 8th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference
Enhanced Topic-based Vector Space Model for semantics-aware spam filtering

Expert Systems with Applications: An International Journal
Measurement and evaluation of a real world deployment of a challenge-response spam filter

Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference
Spam detection using character n-grams

SETN'06 Proceedings of the 4th Helenic conference on Advances in Artificial Intelligence
Neural recognition and genetic features selection for robust detection of e-mail spam

SETN'06 Proceedings of the 4th Helenic conference on Advances in Artificial Intelligence
A propositional approach to textual case indexing

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Comparison of term frequency and document frequency based feature selection metrics in text categorization

Expert Systems with Applications: An International Journal
An add-on to rule-based sifters for multi-recipient spam emails

NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems
Catching the picospams

ISMIS'05 Proceedings of the 15th international conference on Foundations of Intelligent Systems
A comparative performance study of feature selection methods for the anti-spam filtering domain

ICDM'06 Proceedings of the 6th Industrial Conference on Data Mining conference on Advances in Data Mining: applications in Medicine, Web Mining, Marketing, Image and Signal Mining
Segmental parameterisation and statistical modelling of e-mail headers for spam detection

Information Sciences: an International Journal
User action based adaptive learning with weighted bayesian classification for filtering spam mail

AI'06 Proceedings of the 19th Australian joint conference on Artificial Intelligence: advances in Artificial Intelligence
Sprinkling: supervised latent semantic indexing

ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Spam filtering using semantic similarity approach and adaptive BPNN

Neurocomputing
Word sense disambiguation for spam filtering

Electronic Commerce Research and Applications
Grindstone4Spam: An optimization toolkit for boosting e-mail classification

Journal of Systems and Software
Reversing the effects of tokenisation attacks against content-based spam filters

International Journal of Security and Networks

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper presents an extensive empirical evaluation of memory-based learning in the context of anti-spam filtering, a novel cost-sensitive application of text categorization that attempts to identify automatically unsolicited commercial messages that flood mailboxes. Focusing on anti-spam filtering for mailing lists, a thorough investigation of the effectiveness of a memory-based anti-spam filter is performed using a publicly available corpus. The investigation includes different attribute and distance-weighting schemes, and studies on the effect of the neighborhood size, the size of the attribute set, and the size of the training corpus. Three different cost scenarios are identified, and suitable cost-sensitive evaluation functions are employed. We conclude that memory-based anti-spam filtering for mailing lists is practically feasible, especially when combined with additional safety nets. Compared to a previously tested Naive Bayes filter, the memory-based filter performs on average better, particularly when the misclassification cost for non-spam messages is high.