2005 Special Issue: Efficient information theoretic strategies for classifier combination, feature extraction and performance evaluation in improving false positives and false negatives for spam e-mail filtering

Authors:
V. Zorkadis;D. A. Karras;M. Panayotou
Affiliations:
Data Protection Authority and Hellenic Open University, Athens, Greece;Dept. Automation and Hellenic Open University, Chalkis Institute of Technology, Rodu 2, Ano Iliupolis, Athens 16342, Greece;Hellenic Open University, Athens, Greece
Venue:
Neural Networks - 2005 Special issue: IJCNN 2005
Year:
2005

Citing 3
Cited 6

Using Model Trees for Classification

Machine Learning
An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants

Machine Learning
Support vector machines for spam categorization

IEEE Transactions on Neural Networks

Detection and surveillance technologies: privacy-related requirements and protection schemes

International Journal of Electronic Security and Digital Forensics
A comparative study for content-based dynamic spam classification using four machine learning algorithms

Knowledge-Based Systems
Review: A review of machine learning approaches to Spam filtering

Expert Systems with Applications: An International Journal
An ensemble approach applied to classify spam e-mails

Expert Systems with Applications: An International Journal
Using GMDH-based networks for improved spam detection and email feature analysis

Applied Soft Computing
A neural model in anti-spam systems

ICANN'06 Proceedings of the 16th international conference on Artificial Neural Networks - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

Spam emails are considered as a serious privacy-related violation, besides being a costly, unsolicited communication. Various spam filtering techniques have been so far proposed, mainly based on Naive Bayesian algorithms. Other Machine Learning algorithms like Boosting trees, or Support Vector Machines (SVM) have already been used with success. However, the number of False Positives (FP) and False Negatives (FN) resulting through applying various spam e-mail filters still remains too high and the problem of spam e-mail categorization cannot be solved completely from a practical viewpoint. In this paper, we propose a novel approach for spam e-mail filtering based on efficient information theoretic techniques for integrating classifiers, for extracting improved features and for properly evaluating categorization accuracy in terms of FP and FN. The goal of the presented methodology is to empirically but explicitly minimize these FP and FN numbers by combining high-performance FP filters with high-performance FN filters emerging from a previous work of the authors [Zorkadis, V., Panayotou, M., & Karras, D. A. (2005). Improved spam e-mail filtering based on committee machines and information theoretic feature extraction. Proceedings of the International Joint Conference on Neural Networks, July 31-August 4, 2005, Montreal, Canada]. To this end, Random Committee-based filters along with ADTree-based ones are efficiently combined through information theory, respectively. The experiments conducted are of the most extensive ones so far in the literature, exploiting widely accepted benchmarking e-mail data sets and comparing the proposed methodology with the Naive Bayes spam filter as well as with the Boosting tree methodology, the classification via regression and other machine learning models. It is illustrated by means of novel information theoretic measures of FP & FN filtering performance that the proposed approach is very favorably compared to the other rival methods. Finally, it is found that the proposed information theoretic Boolean features present a remarkably high spam categorization performance.