An unbalanced data classification model using hybrid sampling technique for fraud detection

  • Authors:
  • T. Maruthi Padmaja;Narendra Dhulipalla;P. Radha Krishna;Raju S. Bapi;A. Laha

  • Affiliations:
  • Institute for Development and Research in Banking Technology, Hyderabad, India;Institute for Development and Research in Banking Technology, Hyderabad, India;Institute for Development and Research in Banking Technology, Hyderabad, India;Dept of Computer and Information Sciences, University of Hyderabad, India;Institute for Development and Research in Banking Technology, Hyderabad, India

  • Venue:
  • PReMI'07 Proceedings of the 2nd international conference on Pattern recognition and machine intelligence
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Detecting fraud is a challenging task as fraud coexists with the latest in technology. The problem to detect the fraud is that the dataset is unbalanced where non-fraudulent class heavily dominates the fraudulent class. In this work, we considered the fraud detection problem as unbalanced data classification problem and proposed a model based on hybrid sampling technique, which is a combination of random under-sampling and over-sampling using SMOTE. Here, SMOTE is used to widen the data region corresponding to minority samples and random under-sampling of majority class is used for balancing the class distribution. The value difference metric (VDM) is used as distance measure while doing SMOTE. We conducted the experiments with classifiers namely k-NN, Radial Basis Function networks, C4.5 and Naive Bayes with varied levels of SMOTE on insurance fraud dataset. For evaluating the learned classifiers, we have chosen fraud catching rate, nonfraud catching rate in addition to overall accuracy of the classifier as performance measures. Results indicate that our approach produces high predictions against fraud and non-fraud classes.