Learning in the presence of concept drift and hidden contexts
Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
"In vivo" spam filtering: a challenge problem for KDD
ACM SIGKDD Explorations Newsletter
Editorial: special issue on learning from imbalanced data sets
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Applying lazy learning algorithms to tackle concept drift in spam filtering
Expert Systems with Applications: An International Journal
Relaxed online SVMs for spam filtering
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Incremental SVM Model for Spam Detection on Dynamic Email Social Networks
CSE '09 Proceedings of the 2009 International Conference on Computational Science and Engineering - Volume 04
A case-based technique for tracking concept drift in spam filtering
Knowledge-Based Systems
On the utility of incremental feature selection for the classification of textual data streams
PCI'05 Proceedings of the 10th Panhellenic conference on Advances in Informatics
Classification of textual E-mail spam using data mining techniques
Applied Computational Intelligence and Soft Computing
Support vector machines for spam categorization
IEEE Transactions on Neural Networks
Hi-index | 0.00 |
The problem of spam e-mails has been addressed for some time. Most of the solutions are based on spam e-mail classification and filtering. However, the content of spam e-mails drifts with new concepts or social events. Thus, several spam classifiers perform effectively when their models are initially established, and their performances deteriorate with time. A learning mechanism is required to adjust the classification parameters for new and old e-mails. Because of the spread of spam e-mails, the number of spam e-mails is larger than that of legitimate e-mails. Therefore, most classifiers produce high recall for spam e-mails and low recall for legitimate e-mails. Based on the Bayesian algorithm, we propose an incremental forgetting weighted algorithm with a misclassification cost mechanism that extracts features by IGICF (Information Gain and Inverse Class Frequency) to address the problem of concept drift and data skew in spam e-mail classification. We implemented the algorithm and performed detailed tests on the effectiveness of the mechanism.