Boosting support vector machines for imbalanced data sets

Authors:
Benjamin X. Wang;Nathalie Japkowicz
Affiliations:
School of information Technology and Engineering, University of Ottawa, Ottawa, Ontario, Canada;School of information Technology and Engineering, University of Ottawa, Ottawa, Ontario, Canada
Venue:
ISMIS'08 Proceedings of the 17th international conference on Foundations of intelligent systems
Year:
2008

Citing 7
Cited 4

The nature of statistical learning theory

The nature of statistical learning theory
Further results on the margin distribution

COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
Improving support vector machine classifiers by modifying kernal functions

Neural Networks
Combining Statistical Learning with a Knowledge-Based Approach - A Case Study in Intensive Care Monitoring

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
AdaCost: Misclassification Cost-Sensitive Boosting

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
The class imbalance problem: A systematic study

Intelligent Data Analysis

Improving supervised learning for meeting summarization using sampling and regression

Computer Speech and Language
The imbalanced problem in morphological galaxy classification

CIARP'10 Proceedings of the 15th Iberoamerican congress conference on Progress in pattern recognition, image analysis, computer vision, and applications
A combined SMOTE and PSO based RBF classifier for two-class imbalanced problems

Neurocomputing
Adjusted F-measure and kernel scaling for imbalanced data learning

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Real world data mining applications must address the issue of learning from imbalanced data sets. The problem occurs when the number of instances in one class greatly outnumbers the number of instances in the other class. Such data sets often cause a default classifier to be built due to skewed vector spaces or lack of information. Common approaches for dealing with the class imbalance problem involve modifying the data distribution or modifying the classifier. In this work, we choose to use a combination of both approaches. We use support vector machines with soft margins as the base classifier to solve the skewed vector spaces problem. Then we use a boosting algorithm to get an ensemble classifier that has lower error than a single classifier.We found that this ensemble of SVMs makes an impressive improvement in prediction performance, not only for the majority class, but also for the minority class.