Boosting support vector machines for imbalanced data sets

  • Authors:
  • Benjamin X. Wang;Nathalie Japkowicz

  • Affiliations:
  • School of information Technology and Engineering, University of Ottawa, Ottawa, Ontario, Canada;School of information Technology and Engineering, University of Ottawa, Ottawa, Ontario, Canada

  • Venue:
  • ISMIS'08 Proceedings of the 17th international conference on Foundations of intelligent systems
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Real world data mining applications must address the issue of learning from imbalanced data sets. The problem occurs when the number of instances in one class greatly outnumbers the number of instances in the other class. Such data sets often cause a default classifier to be built due to skewed vector spaces or lack of information. Common approaches for dealing with the class imbalance problem involve modifying the data distribution or modifying the classifier. In this work, we choose to use a combination of both approaches. We use support vector machines with soft margins as the base classifier to solve the skewed vector spaces problem. Then we use a boosting algorithm to get an ensemble classifier that has lower error than a single classifier.We found that this ensemble of SVMs makes an impressive improvement in prediction performance, not only for the majority class, but also for the minority class.