Generating diverse ensembles to counter the problem of class imbalance

  • Authors:
  • T. Ryan Hoens;Nitesh V. Chawla

  • Affiliations:
  • The University of Notre Dame, Notre Dame, IN;The University of Notre Dame, Notre Dame, IN

  • Venue:
  • PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

One of the more challenging problems faced by the data mining community is that of imbalanced datasets In imbalanced datasets one class (sometimes severely) outnumbers the other class, causing correct, and useful predictions to be difficult to achieve In order to combat this, many techniques have been proposed, especially centered around sampling methods In this paper we propose an ensemble framework that combines random subspaces with sampling to overcome the class imbalance problem We then experimentally verify this technique on a wide variety of datasets We conclude by analyzing the performance of the ensembles, and showing that, overall, our technique provides a significant improvement.