A normal distribution-based over-sampling approach to imbalanced data classification

  • Authors:
  • Huaxiang Zhang;Zhichao Wang

  • Affiliations:
  • Department of Computer Science, Shandong Normal University, Jinan, Shandong, China;Department of Computer Science, Shandong Normal University, Jinan, Shandong, China

  • Venue:
  • ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

This study proposes a normal distribution-based over-sampling approach to balance the number of instances belonging to different classes in a data set. The balanced training data are used to learn unbiased classifiers for the original data set. Under some conditions, the proposed over-sampling approach generates samples with expected mean and variance similar to that of the original minority class data. As the approach tries to generate synthetic data with similar probability distributions to the original data, and expands the class boundaries of the minority class, it may increase the minority class classification performance. Experimental results show that the proposed approach outperforms alternative methods on benchmark data sets most of the times when implementing several classical classification algorithms.