An empirical comparison of repetitive undersampling techniques

  • Authors:
  • Jason Van Hulse;Taghi M. Khoshgoftaar;Amri Napolitano

  • Affiliations:
  • Department of Computer Science and Engineering, Florida Atlantic University, Boca Raton, FL;Department of Computer Science and Engineering, Florida Atlantic University, Boca Raton, FL;Department of Computer Science and Engineering, Florida Atlantic University, Boca Raton, FL

  • Venue:
  • IRI'09 Proceedings of the 10th IEEE international conference on Information Reuse & Integration
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

A common problem for data mining and machine learning practitioners is class imbalance. When examples of one class greatly outnumber examples of the other class(es), traditional machine learning algorithms can perform poorly. Random undersampling is a technique that has shown great potential for alleviating the problem of class imbalance. However, undersampling leads to information loss which can hinder classification performance in some cases. To overcome this problem, repetitive undersampling techniques have been proposed. These techniques generate an ensemble of models, each trained on a different, undersampled subset of the training data. In doing so, less information is lost and classification performance is improved. In this study, we evaluate the performance of several repetitive undersampling techniques. To our knowledge, no study has so thoroughly compared repetitive undersampling techniques.