Boosting prediction accuracy on imbalanced datasets with SVM ensembles

  • Authors:
  • Yang Liu;Aijun An;Xiangji Huang

  • Affiliations:
  • Department of Computer Science and Engineering, York University, Toronto, Ontario, Canada;Department of Computer Science and Engineering, York University, Toronto, Ontario, Canada;Department of Computer Science and Engineering, York University, Toronto, Ontario, Canada

  • Venue:
  • PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Learning from imbalanced datasets is inherently difficult due to lack of information about the minority class. In this paper, we study the performance of SVMs, which have gained great success in many real applications, in the imbalanced data context. Through empirical analysis, we show that SVMs suffer from biased decision boundaries, and that their prediction performance drops dramatically when the data is highly skewed. We propose to combine an integrated sampling technique with an ensemble of SVMs to improve the prediction performance. The integrated sampling technique combines both over-sampling and under-sampling techniques. Through empirical study, we show that our method outperforms individual SVMs as well as several other state-of-the-art classifiers.