Clustering ensemble for spam filtering

Authors:
Santiago Porras;Bruno Baruque;Belén Vaquerizo;Emilio Corchado
Affiliations:
Civil Engineering Department, University of Burgos;Civil Engineering Department, University of Burgos;Civil Engineering Department, University of Burgos;Departamento de Informática y Automática, Universidad de Salamanca
Venue:
HAIS'11 Proceedings of the 6th international conference on Hybrid artificial intelligent systems - Volume Part II
Year:
2011

Citing 10
Cited 0

The Strength of Weak Learnability

Machine Learning
Bagging predictors

Machine Learning
Self-Organizing Maps

Self-Organizing Maps
When Is ''Nearest Neighbor'' Meaningful?

ICDT '99 Proceedings of the 7th International Conference on Database Theory
Combining Pattern Classifiers: Methods and Algorithms

Combining Pattern Classifiers: Methods and Algorithms
Combining diverse neural nets

The Knowledge Engineering Review
Brief Communication: An Historical Note on the Origins of Probabilistic Indexing

Information Processing and Management: an International Journal
Adaptive mixtures of local experts

Neural Computation
A weighted voting summarization of SOM ensembles

Data Mining and Knowledge Discovery
WeVoS-ViSOM: An ensemble summarization algorithm for enhanced data visualization

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

One of the main problems that modern e-mail systems face is the management of the high degree of spam or junk mail they recieve. Those systems are expected to be able to distinguish between legitimate mail and spam; in order to present the final user as much interesting information as possible. This study presents a novel hybrid intelligent system using both unsupervised and supervised learning that can be easily adapted to be used in an individual or collaborative system. The system divides the spam filtering problem into two stages: firstly it divides the input data space into different similar parts. Then it generates several simple classifiers that are used to classify correctly messages that are contained in one of the parts previously determined. That way the efficiency of each classifier increases, as they can specialize in separate the spam from certain types of related messages. The hybrid system presented has been tested with a real e-mail data base and a comparison of its results with those obtained from other common classification methods is also included. This novel hybrid technique proves to be effective in the problem under study.