Mining with rarity: a unifying framework

  • Authors:
  • Gary M. Weiss

  • Affiliations:
  • AT&T Laboratories, Piscataway, NJ

  • Venue:
  • ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
  • Year:
  • 2004

Quantified Score

Hi-index 0.02

Visualization

Abstract

Rare objects are often of great interest and great value. Until recently, however, rarity has not received much attention in the context of data mining. Now, as increasingly complex real-world problems are addressed, rarity, and the related problem of imbalanced data, are taking center stage. This article discusses the role that rare classes and rare cases play in data mining. The problems that can result from these two forms of rarity are described in detail, as are methods for addressing these problems. These descriptions utilize examples from existing research. So that this article provides a good survey of the literature on rarity in data mining. This article also demonstrates that rare classes and rare cases are very similar phenomena---both forms of rarity are shown to cause similar problems during data mining and benefit from the same remediation methods.