An unbalanced data classification model using hybrid sampling technique for fraud detection

Authors:
T. Maruthi Padmaja;Narendra Dhulipalla;P. Radha Krishna;Raju S. Bapi;A. Laha
Affiliations:
Institute for Development and Research in Banking Technology, Hyderabad, India;Institute for Development and Research in Banking Technology, Hyderabad, India;Institute for Development and Research in Banking Technology, Hyderabad, India;Dept of Computer and Information Sciences, University of Hyderabad, India;Institute for Development and Research in Banking Technology, Hyderabad, India
Venue:
PReMI'07 Proceedings of the 2nd international conference on Pattern recognition and machine intelligence
Year:
2007

Citing 6
Cited 1

Machine Learning for the Detection of Oil Spills in Satellite Radar Images

Machine Learning - Special issue on applications of machine learning and the knowledge discovery process
MetaCost: a general method for making classifiers cost-sensitive

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Distributed Data Mining in Credit Card Fraud Detection

IEEE Intelligent Systems
Minority report in fraud detection: classification of skewed data

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
Improved heterogeneous distance functions

Journal of Artificial Intelligence Research

Analysis of an evolutionary RBFN design algorithm, CO2RBFN, for imbalanced data sets

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

Detecting fraud is a challenging task as fraud coexists with the latest in technology. The problem to detect the fraud is that the dataset is unbalanced where non-fraudulent class heavily dominates the fraudulent class. In this work, we considered the fraud detection problem as unbalanced data classification problem and proposed a model based on hybrid sampling technique, which is a combination of random under-sampling and over-sampling using SMOTE. Here, SMOTE is used to widen the data region corresponding to minority samples and random under-sampling of majority class is used for balancing the class distribution. The value difference metric (VDM) is used as distance measure while doing SMOTE. We conducted the experiments with classifiers namely k-NN, Radial Basis Function networks, C4.5 and Naive Bayes with varied levels of SMOTE on insurance fraud dataset. For evaluating the learned classifiers, we have chosen fraud catching rate, nonfraud catching rate in addition to overall accuracy of the classifier as performance measures. Results indicate that our approach produces high predictions against fraud and non-fraud classes.