Class imbalances versus small disjuncts

  • Authors:
  • Taeho Jo;Nathalie Japkowicz

  • Affiliations:
  • University of Ottawa, Ottawa, Ontario, Canada;University of Ottawa, Ottawa, Ontario, Canada

  • Venue:
  • ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
  • Year:
  • 2004

Quantified Score

Hi-index 0.01

Visualization

Abstract

It is often assumed that class imbalances are responsible for significant losses of performance in standard classifiers. The purpose of this paper is to the question whether class imbalances are truly responsible for this degradation or whether it can be explained in some other way. Our experiments suggest that the problem is not directly caused by class imbalances, but rather, that class imbalances may yield small disjuncts which, in turn, will cause degradation. We argue that, in order to improve classifier performance, it may, then, be more useful to focus on the small disjuncts problem than it is to focus on the class imbalance problem. We experiment with a method that takes the small disjunct problem into consideration, and show that, indeed, it yields a performance superior to the performance obtained using standard or advanced solutions to the class imbalance problem.