Spam e-mail classification based on the IFWB algorithm

Authors:
Chichang Jou
Affiliations:
Department of Information Management, Tamkang University, New Taipei City, Taiwan
Venue:
ACIIDS'13 Proceedings of the 5th Asian conference on Intelligent Information and Database Systems - Volume Part I
Year:
2013

Citing 11
Cited 0

Learning in the presence of concept drift and hidden contexts

Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
"In vivo" spam filtering: a challenge problem for KDD

ACM SIGKDD Explorations Newsletter
Editorial: special issue on learning from imbalanced data sets

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Applying lazy learning algorithms to tackle concept drift in spam filtering

Expert Systems with Applications: An International Journal
Relaxed online SVMs for spam filtering

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Incremental SVM Model for Spam Detection on Dynamic Email Social Networks

CSE '09 Proceedings of the 2009 International Conference on Computational Science and Engineering - Volume 04
A case-based technique for tracking concept drift in spam filtering

Knowledge-Based Systems
On the utility of incremental feature selection for the classification of textual data streams

PCI'05 Proceedings of the 10th Panhellenic conference on Advances in Informatics
Classification of textual E-mail spam using data mining techniques

Applied Computational Intelligence and Soft Computing
Support vector machines for spam categorization

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

The problem of spam e-mails has been addressed for some time. Most of the solutions are based on spam e-mail classification and filtering. However, the content of spam e-mails drifts with new concepts or social events. Thus, several spam classifiers perform effectively when their models are initially established, and their performances deteriorate with time. A learning mechanism is required to adjust the classification parameters for new and old e-mails. Because of the spread of spam e-mails, the number of spam e-mails is larger than that of legitimate e-mails. Therefore, most classifiers produce high recall for spam e-mails and low recall for legitimate e-mails. Based on the Bayesian algorithm, we propose an incremental forgetting weighted algorithm with a misclassification cost mechanism that extracts features by IGICF (Information Gain and Inverse Class Frequency) to address the problem of concept drift and data skew in spam e-mail classification. We implemented the algorithm and performed detailed tests on the effectiveness of the mechanism.