Selective Pre-processing of Imbalanced Data for Improving Classification Performance

  • Authors:
  • Jerzy Stefanowski;Szymon Wilk

  • Affiliations:
  • Institute of Computing Science, Poznań University of Technology, Poznań, Poland 60---965;Institute of Computing Science, Poznań University of Technology, Poznań, Poland 60---965 and Telfer School of Management, University of Ottawa, Ottawa, Canada K1N 6N5

  • Venue:
  • DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we discuss problems of constructing classifiers from imbalanced data. We describe a new approach to selective pre-processing of imbalanced data which combines local over-sampling of the minority class with filtering difficult examples from the majority classes. In experiments focused on rule-based and tree-based classifiers we compare our approach with two other related pre-processing methods --- NCR and SMOTE. The results show that NCR is too strongly biased toward the minority class and leads to deteriorated specificity and overall accuracy, while SMOTE and our approach do not demonstrate such behavior. Analysis of the degree to which the original class distribution has been modified also reveals that our approach does not introduce so extensive changes as SMOTE.