Lazy Bagging for Classifying Imbalanced Data

  • Authors:
  • Xingquan Zhu

  • Affiliations:
  • -

  • Venue:
  • ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we propose a Lazy Bagging (LB) design, which builds bootstrap replicate bags based on the characteristics of the test instances. Upon receiving a test instance Ik, LB will trim bootstrap bags by taking Ik's nearest neighbors in the training set into consideration. Our hypothesis is that an unlabeled instance's nearest neighbors provide valuable information for learners to refine their local decision boundaries for classifying this instance. By taking full advantage of Ik's nearest neighbors, the base learners are able to receive less bias and variance in classifying Ik. This strategy is beneficial for classifying imbalanced data because refining local decision boundaries can help a learner reduce its inherent bias towards the majority class and improve its performance on minority class examples. Our experimental results will confirm that LB outperforms C4.5 and TB in terms of reducing classification error, and most importantly this error reduction is largely contributed from LB's improvement on minority class examples.