Mining Data with Rare Events: A Case Study

  • Authors:
  • Chris Seiffert;Taghi M. Khoshgoftaar;Jason Van Hulse;Amri Napolitano

  • Affiliations:
  • -;-;-;-

  • Venue:
  • ICTAI '07 Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence - Volume 02
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

The performance of classification models can be nega- tively impacted if the data on which they are trained con- tains very rare events. While recent research has investi- gated the issue of class imbalance, few if any studies ad- dress issues related to the handling of extreme imbalance (rare events), where the minority class can account for as little as 0.1% of the training data. This work investigates the effect of dataset size and class distribution on classifi- cation performance when examples from the minority class are rare. In addition, we compare the performance improve- ment achieved by acquiring additional examples to that of applying data sampling. Our results demonstrate that data sampling is very effective at alleviating the problem of rare events.