Classifying imbalanced data using a bagging ensemble variation (BEV)

  • Authors:
  • Cen Li

  • Affiliations:
  • Middle Tennessee State University, Murfreesboro, TN

  • Venue:
  • ACM-SE 45 Proceedings of the 45th annual southeast regional conference
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

In many applications, data collected are highly skewed where data of one class clearly dominates data from the other classes. Most existing classification systems that perform well on balanced data give very poor performance on imbalanced data, especially for the minority class data. Existing work on improving the quality of classification on imbalanced data include over-sampling, under-sampling, and methods that make modifications to the existing classification systems. This paper discusses the BEV system for classifying imbalanced data. The system is developed based on the ideas from the "Bagging" classification ensemble. The motivation behind the scheme is to maximally use the minority class data without creating synthetic data or making changes to the existing classification systems. Experimental results using real world imbalanced data show the effectiveness of the system.