Unbalanced Data Classification Using extreme outlier Elimination and Sampling Techniques for Fraud Detection

Authors:
T. Maruthi Padmaja;Narendra Dhulipalla;Raju S. Bapi;P. Radha Krishna
Affiliations:
-;-;-;-
Venue:
ADCOM '07 Proceedings of the 15th International Conference on Advanced Computing and Communications
Year:
2007

Citing 0
Cited 6

Toward breast cancer survivability prediction models through improving training space

Expert Systems with Applications: An International Journal
Outlier Detection with Explanation Facility

MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
Outlier Detection with a Hybrid Artificial Intelligence Method

MICAI '09 Proceedings of the 8th Mexican International Conference on Artificial Intelligence
Securing data aggregation against false data injection in wireless sensor networks

ICACT'10 Proceedings of the 12th international conference on Advanced communication technology
Dynamic classifier ensemble model for customer classification with imbalanced class distribution

Expert Systems with Applications: An International Journal
A novel algorithm applied to classify unbalanced data

Applied Soft Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Detecting fraud from the highly overlapped and imbalanced fraud dataset is a challenging task. To solve this problem, we propose a new approach called extreme outlier elimination and hybrid sampling technique. k Reverse Nearest Neighbors (kRNNs) concept used as a data cleaning method for eliminating extreme outliers in minority regions. Hybrid sampling technique, a combination of SMOTE to over-sample the minority data (fraud samples) and random under- sampling to under-sample the majority data (non-fraud samples) is used for improving the fraud detection accuracy. This method was evaluated in terms of True Positive rate and True Negative rate on the insurance fraud dataset. We conducted the experiments with classifiers namely C4.5, Naïve Bayes, k-NN and Radial Basis Function networks and compared the performance of our approach against simple hybrid sampling technique. Obtained results shown that extreme outlier elimination from minority class, produce high predictions for both fraud and non-fraud classes. Keywords: Data Mining, Unbalanced dataset, kRNN, Hybrid Sampling, SMOTE and Fraud Detection.