Computing a Comprehensible Model for Spam Filtering

Authors:
Amparo Ruiz-Sepúlveda;José L. Triviño-Rodriguez;Rafael Morales-Bueno
Affiliations:
Department of Computer Science and Artificial Intelligence, University of Málaga, Málaga, Spain;Department of Computer Science and Artificial Intelligence, University of Málaga, Málaga, Spain;Department of Computer Science and Artificial Intelligence, University of Málaga, Málaga, Spain
Venue:
DS '09 Proceedings of the 12th International Conference on Discovery Science
Year:
2009

Citing 17
Cited 0

Original Contribution: Stacked generalization

Neural Networks
C4.5: programs for machine learning

C4.5: programs for machine learning
Automated learning of decision rules for text categorization

ACM Transactions on Information Systems (TOIS)
On the boosting ability of top-down decision tree learning algorithms

STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Combining classifiers in text categorization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Context-sensitive learning methods for text categorization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Boosting and Rocchio applied to text filtering

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Hierarchical neural networks for text categorization (poster abstract)

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
The Alternating Decision Tree Learning Algorithm

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
A decision-theoretic generalization of on-line learning and an application to boosting

EuroCOLT '95 Proceedings of the Second European Conference on Computational Learning Theory
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
How an Ensemble Method Can Compute a Comprehensible Model

DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
Searching for Interacting Features for Spam Filtering

ISNN '08 Proceedings of the 5th international symposium on Neural Networks: Advances in Neural Networks
Review: A review of machine learning approaches to Spam filtering

Expert Systems with Applications: An International Journal
Bagging, boosting, and C4.S

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
A comparative performance study of feature selection methods for the anti-spam filtering domain

ICDM'06 Proceedings of the 6th Industrial Conference on Data Mining conference on Advances in Data Mining: applications in Medicine, Web Mining, Marketing, Image and Signal Mining
Support vector machines for spam categorization

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we describe the application of the Desicion Tree Boosting (DTB) learning model to spam email filtering.This classification task implies the learning in a high dimensional feature space. So, it is an example of how the DTB algorithm performs in such feature space problems. In [1], it has been shown that hypotheses computed by the DTB model are more comprehensible that the ones computed by another ensemble methods. Hence, this paper tries to show that the DTB algorithm maintains the same comprehensibility of hypothesis in high dimensional feature space problems while achieving the performance of other ensemble methods. Four traditional evaluation measures (precision, recall, F1 and accuracy) have been considered for performance comparison between DTB and others models usually applied to spam email filtering. The size of the hypothesis computed by a DTB is smaller and more comprehensible than the hypothesis computed by Adaboost and Naïve Bayes.