Evolutionary data analysis for the class imbalance problem

  • Authors:
  • Taghi M. Khoshgoftaar;Naeem Seliya;Dennis J. Drown

  • Affiliations:
  • (Correspd. E-mail: taghi@cse.fau.edu) Computer and Electrical Engineering and Computer Science Department, Florida Atlantic University, FL, USA;Computer and Information Science Department, University of Michigan - Dearborn, MI, USA;Computer and Electrical Engineering and Computer Science Department, Florida Atlantic University, FL, USA

  • Venue:
  • Intelligent Data Analysis
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Class imbalance, where the classes in a dataset are not represented equally, is a common occurrence in machine learning. Classification models built with such datasets are often not practical since most machine learning algorithms would tend to perform poorly on the minority class instances. We present a unique evolutionary computing-based data sampling approach as an effective solution for the class imbalance problem. The genetic algorithm-based approach, Evolutionary Sampling, works as a majority undersampling technique where instances from the majority class are selectively removed. This preserves the relative integrity of the majority class while maintaining the original minority class group. Our research prototype, eVann, also implements genetic-algorithm-based optimization of modeling parameters for the machine learning algorithms considered in our study. An extensive empirical investigation involving four real-world datasets is performed, comparing the proposed approach to other existing data sampling techniques that target the class imbalance problem. Our results demonstrate that Evolutionary Sampling, both with and without learner optimization, performs relatively better than other data sampling techniques. A detailed coverage of our case studies in this paper lends itself toward empirical replication.