Mining with rarity: a unifying framework
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Using Emerging Patterns and Decision Trees in Rare-Class Classification
ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Selective Pre-processing of Imbalanced Data for Improving Classification Performance
DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
Handling class imbalance in customer churn prediction
Expert Systems with Applications: An International Journal
Rule Learning with Probabilistic Smoothing
DaWaK '09 Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery
Knowledge discovery from imbalanced and noisy data
Data & Knowledge Engineering
An empirical comparison of repetitive undersampling techniques
IRI'09 Proceedings of the 10th IEEE international conference on Information Reuse & Integration
Application of combined support vector machines in process fault diagnosis
ACC'09 Proceedings of the 2009 conference on American Control Conference
Improving software-quality predictions with data sampling and boosting
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Hi-index | 0.00 |
Rare class prediction is the data mining task aiming at building a model that can correctly identify objects or events rarely occurring in the data set. In many real life applications such as identification of intruders accessing a network system, detecting fraudulent credit card transactions, it is rare events that are of great interest. Unfortunately, traditional mining algorithms fail to predict rare events because the model are inherently built in favor of the majority class to draw common characteristics among data instances. Rare class mining is thus a challenging problem in some specific domains. We study the rare class mining problem in the context of semiconductor manufacturing process control in which fault products are rarely occurred, but once occurring they require timely detection to prevent the decrease in product yield. In this paper, we propose to use an over-sampling technique to alleviate the outnumber situation of majority class. Such sampling technique is however prone to introducing the over-fitting problem. We thus propose the remedy by applying the cluster based technique to selectively extract data instances showing discrimination characteristics. The built models from various mining algorithms have been tested with a separate data set and the results show significant improvement on the predicting accuracy.