Data preparation techniques for improving rare class prediction

Authors:
Nittaya Kerdprasop;Kittisak Kerdprasop
Affiliations:
Data Engineering Research Unit, School of Computer Engineering, Suranaree University of Technology, Nakhon Ratchasima, Thailand;Data Engineering Research Unit, School of Computer Engineering, Suranaree University of Technology, Nakhon Ratchasima, Thailand
Venue:
MAMECTIS/NOLASC/CONTROL/WAMUS'11 Proceedings of the 13th WSEAS international conference on mathematical methods, computational techniques and intelligent systems, and 10th WSEAS international conference on non-linear analysis, non-linear systems and chaos, and 7th WSEAS international conference on dynamical systems and control, and 11th WSEAS international conference on Wavelet analysis and multirate systems: recent researches in computational techniques, non-linear systems and control
Year:
2011

Citing 9
Cited 0

Mining with rarity: a unifying framework

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Using Emerging Patterns and Decision Trees in Rare-Class Classification

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Selective Pre-processing of Imbalanced Data for Improving Classification Performance

DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
Handling class imbalance in customer churn prediction

Expert Systems with Applications: An International Journal
Rule Learning with Probabilistic Smoothing

DaWaK '09 Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery
Knowledge discovery from imbalanced and noisy data

Data & Knowledge Engineering
An empirical comparison of repetitive undersampling techniques

IRI'09 Proceedings of the 10th IEEE international conference on Information Reuse & Integration
Application of combined support vector machines in process fault diagnosis

ACC'09 Proceedings of the 2009 conference on American Control Conference
Improving software-quality predictions with data sampling and boosting

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans

Quantified Score

Hi-index	0.00

Visualization

Abstract

Rare class prediction is the data mining task aiming at building a model that can correctly identify objects or events rarely occurring in the data set. In many real life applications such as identification of intruders accessing a network system, detecting fraudulent credit card transactions, it is rare events that are of great interest. Unfortunately, traditional mining algorithms fail to predict rare events because the model are inherently built in favor of the majority class to draw common characteristics among data instances. Rare class mining is thus a challenging problem in some specific domains. We study the rare class mining problem in the context of semiconductor manufacturing process control in which fault products are rarely occurred, but once occurring they require timely detection to prevent the decrease in product yield. In this paper, we propose to use an over-sampling technique to alleviate the outnumber situation of majority class. Such sampling technique is however prone to introducing the over-fitting problem. We thus propose the remedy by applying the cluster based technique to selectively extract data instances showing discrimination characteristics. The built models from various mining algorithms have been tested with a separate data set and the results show significant improvement on the predicting accuracy.