Start Globally, Optimize Locally, Predict Globally: Improving Performance on Imbalanced Data

  • Authors:
  • David A. Cieslak;Nitesh V. Chawla

  • Affiliations:
  • -;-

  • Venue:
  • ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Class imbalance is a ubiquitous problem in supervised learning and has gained wide-scale attention in the literature. Perhaps the most prevalent solution is to applysampling to training data in order improve classifier performance. The typical approach will apply uniform levels of sampling globally. However, we believe that datais typically multi-modal, which suggests sampling shouldbe treated locally rather than globally. It is the purposeof this paper to propose a framework which first identifies meaningful regions of data and then proceeds to findoptimal sampling levels within each. This paper demonstrates that a global classifier trained on data locally sampled produces superior rank-orderings on a wide range ofreal-world and artificial datasets as compared to contemporary global sampling methods.