A novel synthetic minority oversampling technique for imbalanced data set learning

  • Authors:
  • Sukarna Barua;Md. Monirul Islam;Kazuyuki Murase

  • Affiliations:
  • Bangladesh University of Engineering and Technology (BUET), Dhaka, Bangladesh;Bangladesh University of Engineering and Technology (BUET), Dhaka, Bangladesh;University of Fukui, Fukui, Japan

  • Venue:
  • ICONIP'11 Proceedings of the 18th international conference on Neural Information Processing - Volume Part II
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Imbalanced data sets contain an unequal distribution of data samples among the classes and pose a challenge to the learning algorithms as it becomes hard to learn the minority class concepts. Synthetic oversampling techniques address this problem by creating synthetic minority samples to balance the data set. However, most of these techniques may create wrong synthetic minority samples which fall inside majority regions. In this respect, this paper presents a novel Cluster Based Synthetic Oversampling (CBSO) algorithm. CBSO adopts its basic idea from existing synthetic oversampling techniques and incorporates unsupervised clustering in its synthetic data generation mechanism. CBSO ensures that synthetic samples created via this method always lie inside minority regions and thus, avoids any wrong synthetic sample creation. Simualtion analyses on some real world datasets show the effectiveness of CBSO showing improvements in various assesment metrics such as overall accuracy, F-measure, and G-mean.