Analyzing the Performance of Spam Filtering Methods When Dimensionality of Input Vector Changes

Authors:
J. R. Méndez;B. Corzo;D. Glez-Peña;F. Fdez-Riverola;F. Díaz
Affiliations:
Computer Science Dept., University of Vigo, Escuela Superior de Ingeniería Informática, Edificio Politécnico, Campus Universitario As Lagoas s/n, 32004, Ourense, Spain;Dept. Advertising Graphics, Arts College of Oviedo, C/ Julián Clavería, 12, 33006, Oviedo, Spain;Computer Science Dept., University of Vigo, Escuela Superior de Ingeniería Informática, Edificio Politécnico, Campus Universitario As Lagoas s/n, 32004, Ourense, Spain;Computer Science Dept., University of Vigo, Escuela Superior de Ingeniería Informática, Edificio Politécnico, Campus Universitario As Lagoas s/n, 32004, Ourense, Spain;Computer Science Dept., University of Valladolid, Escuela Universitaria de Informática, Plaza Santa Eulalia, 9-11, 40005, Segovia, Spain
Venue:
MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
Year:
2007

Citing 16
Cited 2

Averaging over decision stumps

ECML-94 Proceedings of the European conference on machine learning on Machine Learning
The nature of statistical learning theory

The nature of statistical learning theory
Performance standards and evaluations in IR test collections: cluster-based retrieval models

Information Processing and Management: an International Journal
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Information Retrieval

Information Retrieval
Modern Information Retrieval

Modern Information Retrieval
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Applying lazy learning algorithms to tackle concept drift in spam filtering

Expert Systems with Applications: An International Journal
SpamHunting: An instance-based reasoning system for spam labelling and filtering

Decision Support Systems
Inside the spam cartel

Inside the spam cartel
A study of cross-validation and bootstrap for accuracy estimation and model selection

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Estimating continuous distributions in Bayesian classifiers

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
Tokenising, stemming and stopword removal on anti-spam filtering domain

CAEPIA'05 Proceedings of the 11th Spanish association conference on Current Topics in Artificial Intelligence
Acquiring similarity cases for classification problems

ICCBR'05 Proceedings of the 6th international conference on Case-Based Reasoning Research and Development
A comparative performance study of feature selection methods for the anti-spam filtering domain

ICDM'06 Proceedings of the 6th Industrial Conference on Data Mining conference on Advances in Data Mining: applications in Medicine, Web Mining, Marketing, Image and Signal Mining
Tracking concept drift at feature selection stage in spamhunting: an anti-spam instance-based reasoning system

ECCBR'06 Proceedings of the 8th European conference on Advances in Case-Based Reasoning

Rough sets for spam filtering: Selecting appropriate decision rules for boundary e-mail classification

Applied Soft Computing
Grindstone4Spam: An optimization toolkit for boosting e-mail classification

Journal of Systems and Software

Quantified Score

Hi-index	0.00

Visualization

Abstract

Spam is a complex problem that makes difficult the exploitation of Internet resources. In this sense, several authorities have alerted about the dimension of this problem and aim everybody to fight against it. In this paper we present an extensive analysis showing how the effect of changing the dimensionality of message representation influences the accuracy of some well-known classical spam filtering techniques. The conclusions drawn from the experiments carried out will be useful for building a comparison of the dimensionality reorganization effects between classical filtering techniques and a successful spam filter model called SpamHunting.