A Case Study for Learning from Imbalanced Data Sets

  • Authors:
  • Aijun An;Nick Cercone;Xiangji Huang

  • Affiliations:
  • -;-;-

  • Venue:
  • AI '01 Proceedings of the 14th Biennial Conference of the Canadian Society on Computational Studies of Intelligence: Advances in Artificial Intelligence
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present our experience in applying a rule induction technique to an extremely imbalanced pharmaceutical data set. We focus on using a variety of performance measures to evaluate a number of rule quality measures. We also investigate whether simply changing the distribution skew in the training data can improve predictive performance. Finally, we propose a method for adjusting the learning algorithm for learning in an extremely imbalanced environment. Our experimental results show that this adjustment improves predictive performance for rule quality formulas in which rule coverage makes positive contributions to the rule quality value.