Ensembles of decision trees for imbalanced data
MCS'11 Proceedings of the 10th international conference on Multiple classifier systems
Using model trees and their ensembles for imbalanced data
CAEPIA'11 Proceedings of the 14th international conference on Advances in artificial intelligence: spanish association for artificial intelligence
Predicting high-risk program modules by selecting the right software measurements
Software Quality Control
Neural network ensembles to determine growth multi-classes in predictive microbiology
HAIS'12 Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part II
Prediction of liquefaction potential based on CPT up-sampling
Computers & Geosciences
GAB-EPA: a GA based ensemble pruning approach to tackle multiclass imbalanced problems
ACIIDS'13 Proceedings of the 5th Asian conference on Intelligent Information and Database Systems - Volume Part I
Evaluation of sampling methods for learning from imbalanced data
ICIC'13 Proceedings of the 9th international conference on Intelligent Computing Theories
Multi-class boosting with asymmetric binary weak-learners
Pattern Recognition
Hi-index | 0.00 |
Class imbalance is a problem that is common to many application domains. When examples of one class in a training data set vastly outnumber examples of the other class(es), traditional data mining algorithms tend to create suboptimal classification models. Several techniques have been used to alleviate the problem of class imbalance, including data sampling and boosting. In this paper, we present a new hybrid sampling/boosting algorithm, called RUSBoost, for learning from skewed training data. This algorithm provides a simpler and faster alternative to SMOTEBoost, which is another algorithm that combines boosting and data sampling. This paper evaluates the performances of RUSBoost and SMOTEBoost, as well as their individual components (random undersampling, synthetic minority oversampling technique, and AdaBoost). We conduct experiments using 15 data sets from various application domains, four base learners, and four evaluation metrics. RUSBoost and SMOTEBoost both outperform the other procedures, and RUSBoost performs comparably to (and often better than) SMOTEBoost while being a simpler and faster technique. Given these experimental results, we highly recommend RUSBoost as an attractive alternative for improving the classification performance of learners built using imbalanced data.