RUSBoost: A Hybrid Approach to Alleviating Class Imbalance

Authors:
C. Seiffert;T. M. Khoshgoftaar;J. Van Hulse;A. Napolitano
Affiliations:
Dept. of Comput. Sci. & Eng., Florida Atlantic Univ., Boca Raton, FL, USA;-;-;-
Venue:
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Year:
2010

Citing 0
Cited 11

Ensembles of decision trees for imbalanced data

MCS'11 Proceedings of the 10th international conference on Multiple classifier systems
Using model trees and their ensembles for imbalanced data

CAEPIA'11 Proceedings of the 14th international conference on Advances in artificial intelligence: spanish association for artificial intelligence
Predicting high-risk program modules by selecting the right software measurements

Software Quality Control
Neural network ensembles to determine growth multi-classes in predictive microbiology

HAIS'12 Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part II
Prediction of liquefaction potential based on CPT up-sampling

Computers & Geosciences
DBFS: An effective Density Based Feature Selection scheme for small sample size and high dimensional imbalanced data sets

Data & Knowledge Engineering
GAB-EPA: a GA based ensemble pruning approach to tackle multiclass imbalanced problems

ACIIDS'13 Proceedings of the 5th Asian conference on Intelligent Information and Database Systems - Volume Part I
EUSBoost: Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling

Pattern Recognition
Evaluation of sampling methods for learning from imbalanced data

ICIC'13 Proceedings of the 9th international conference on Intelligent Computing Theories
Boosted SVM for extracting rules from imbalanced data in application to prediction of the post-operative life expectancy in the lung cancer patients

Applied Soft Computing
Multi-class boosting with asymmetric binary weak-learners

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Class imbalance is a problem that is common to many application domains. When examples of one class in a training data set vastly outnumber examples of the other class(es), traditional data mining algorithms tend to create suboptimal classification models. Several techniques have been used to alleviate the problem of class imbalance, including data sampling and boosting. In this paper, we present a new hybrid sampling/boosting algorithm, called RUSBoost, for learning from skewed training data. This algorithm provides a simpler and faster alternative to SMOTEBoost, which is another algorithm that combines boosting and data sampling. This paper evaluates the performances of RUSBoost and SMOTEBoost, as well as their individual components (random undersampling, synthetic minority oversampling technique, and AdaBoost). We conduct experiments using 15 data sets from various application domains, four base learners, and four evaluation metrics. RUSBoost and SMOTEBoost both outperform the other procedures, and RUSBoost performs comparably to (and often better than) SMOTEBoost while being a simpler and faster technique. Given these experimental results, we highly recommend RUSBoost as an attractive alternative for improving the classification performance of learners built using imbalanced data.