SpamHunting: An instance-based reasoning system for spam labelling and filtering

Authors:
F. Fdez-Riverola;E. L. Iglesias;F. Díaz;J. R. Méndez;J. M. Corchado
Affiliations:
Dept. Informática, University of Vigo, Escuela Superior de Ingeniería Informática, Edificio Politécnico, Campus Universitario As Lagoas s/n, 32004, Ourense, Spain;Dept. Informática, University of Vigo, Escuela Superior de Ingeniería Informática, Edificio Politécnico, Campus Universitario As Lagoas s/n, 32004, Ourense, Spain;Dept. Informática, University of Valladolid, Escuela Universitaria de Informática, Plaza Santa Eulalia, 9-11, 40005, Segovia, Spain;Dept. Informática, University of Vigo, Escuela Superior de Ingeniería Informática, Edificio Politécnico, Campus Universitario As Lagoas s/n, 32004, Ourense, Spain;Dept. Informática y Automática, University of Salamanca, Plaza de la Merced s/n, 37008, Salamanca, Spain
Venue:
Decision Support Systems
Year:
2007

Citing 21
Cited 20

C4.5: programs for machine learning

C4.5: programs for machine learning
Case-based reasoning: foundational issues, methodological variations, and system approaches

AI Communications
Learning in the presence of concept drift and hidden contexts

Machine Learning
Lazy learning

Lazy learning
Applying case-based reasoning: techniques for enterprise systems

Applying case-based reasoning: techniques for enterprise systems
Context-sensitive learning methods for text categorization

ACM Transactions on Information Systems (TOIS)
The impact of changing populations on classifier performance

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
BoosTexter: A Boosting-based Systemfor Text Categorization

Machine Learning - Special issue on information retrieval
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
A Memory-Based Approach to Anti-Spam Filtering for Mailing Lists

Information Retrieval
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Inducing Cost-Sensitive Trees via Instance Weighting

PKDD '98 Proceedings of the Second European Symposium on Principles of Data Mining and Knowledge Discovery
Diagnosis and Decision Support

Case-Based Reasoning Technology, From Foundations to Applications
FSfRT: Forecasting System for Red Tides

Applied Intelligence
Combining text and heuristics for cost-sensitive spam filtering

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
A study of cross-validation and bootstrap for accuracy estimation and model selection

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Estimating continuous distributions in Bayesian classifiers

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
Support vector machines for spam categorization

IEEE Transactions on Neural Networks

Applying lazy learning algorithms to tackle concept drift in spam filtering

Expert Systems with Applications: An International Journal
Analyzing the Performance of Spam Filtering Methods When Dimensionality of Input Vector Changes

MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
Assessing Classification Accuracy in the Revision Stage of a CBR Spam Filtering System

ICCBR '07 Proceedings of the 7th international conference on Case-Based Reasoning: Case-Based Reasoning Research and Development
Adaptive Spam Detection Inspired by a Cross-Regulation Model of Immune Dynamics: A Study of Concept Drift

ICARIS '08 Proceedings of the 7th international conference on Artificial Immune Systems
Classification Agent-Based Techniques for Detecting Intrusions in Databases

HAIS '08 Proceedings of the 3rd international workshop on Hybrid Artificial Intelligence Systems
Managing irrelevant knowledge in CBR models for unsolicited e-mail classification

Expert Systems with Applications: An International Journal
Review: A review of machine learning approaches to Spam filtering

Expert Systems with Applications: An International Journal
Automatic thesaurus construction for spam filtering using revised back propagation neural network

Expert Systems with Applications: An International Journal
BioDR: Semantic indexing networks for biomedical document retrieval

Expert Systems with Applications: An International Journal
Relaxing feature selection in spam filtering by using case-based reasoning systems

EPIA'07 Proceedings of the aritficial intelligence 13th Portuguese conference on Progress in artificial intelligence
Classification of Protein-Protein Interaction Full-Text Documents Using Text and Citation Network Features

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
λ-Perceptron: An adaptive classifier for data streams

Pattern Recognition
A comparative performance study of feature selection methods for the anti-spam filtering domain

ICDM'06 Proceedings of the 6th Industrial Conference on Data Mining conference on Advances in Data Mining: applications in Medicine, Web Mining, Marketing, Image and Signal Mining
Tracking concept drift at feature selection stage in spamhunting: an anti-spam instance-based reasoning system

ECCBR'06 Proceedings of the 8th European conference on Advances in Case-Based Reasoning
Segmental parameterisation and statistical modelling of e-mail headers for spam detection

Information Sciences: an International Journal
Spam filtering using semantic similarity approach and adaptive BPNN

Neurocomputing
Online linear and quadratic discriminant analysis with adaptive forgetting for streaming classification

Statistical Analysis and Data Mining
SDAI: An integral evaluation methodology for content-based spam filtering models

Expert Systems with Applications: An International Journal
Rough sets for spam filtering: Selecting appropriate decision rules for boundary e-mail classification

Applied Soft Computing
Grindstone4Spam: An optimization toolkit for boosting e-mail classification

Journal of Systems and Software

Quantified Score

Hi-index	0.01

Visualization

Abstract

In this paper we show an instance-based reasoning e-mail filtering model that outperforms classical machine learning techniques and other successful lazy learners approaches in the domain of anti-spam filtering. The architecture of the learning-based anti-spam filter is based on a tuneable enhanced instance retrieval network able to accurately generalize e-mail representations. The reuse of similar messages is carried out by a simple unanimous voting mechanism to determine whether the target case is spam or not. Previous to the final response of the system, the revision stage is only performed when the assigned class is spam whereby the system employs general knowledge in the form of meta-rules.