ACM Computing Surveys (CSUR)
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
Text Categorization Based on Regularized Linear Classification Methods
Information Retrieval
Distributed Data Mining in Credit Card Fraud Detection
IEEE Intelligent Systems
A refinement approach to handling model misfit in text categorization
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
"In vivo" spam filtering: a challenge problem for KDD
ACM SIGKDD Explorations Newsletter
Information Processing and Management: an International Journal - Special issue: Formal methods for information retrieval
Tackling concept drift by temporal inductive transfer
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
The condensed nearest neighbor rule (Corresp.)
IEEE Transactions on Information Theory
Support vector machines for spam categorization
IEEE Transactions on Neural Networks
Detection of cloaked web spam by using tag-based methods
Expert Systems with Applications: An International Journal
Review: A review of machine learning approaches to Spam filtering
Expert Systems with Applications: An International Journal
Using GMDH-based networks for improved spam detection and email feature analysis
Applied Soft Computing
Facing the spammers: A very effective approach to avoid junk e-mails
Expert Systems with Applications: An International Journal
Classification of textual E-mail spam using data mining techniques
Applied Computational Intelligence and Soft Computing
Concept drift detection via competence models
Artificial Intelligence
Hi-index | 12.06 |
As email becomes a popular means for communication over the Internet, the problem of receiving unsolicited and undesired emails, called spam or junk mails, severely arises. To filter spam from legitimate emails, automatic classification approaches using text mining techniques are proposed. This kind of approaches, however, often suffers from low recall rate due to the natures of spam, skewed class distributions and concept drift. This research is thus to propose an appropriate classification approach to alleviating the problems of skewed class distributions and drifting concepts. A cluster-based classification method, called ICBC, is developed accordingly. ICBC contains two phases. In the first phase, it clusters emails in each given class into several groups, and an equal number of features (keywords) are extracted from each group to manifest the features in the minority class. In the second phase, we capacitate ICBC with an incremental learning mechanism that can adapt itself to accommodate the changes of the environment in a fast and low-cost manner. Three experiments are conducted to evaluate the performance of ICBC. The results show that ICBC can effectively deal with the issues of skewed and changing class distributions, and its incremental learning can also reduce the cost of re-training. The feasibility of the proposed approach is thus justified.