An empirical evaluation of bagging with different algorithms on imbalanced data

Authors:
Guohua Liang;Chengqi Zhang
Affiliations:
The Centre for Quantum Computation & Intelligent Systems, FEIT, University of Technology, Sydney, NSW, Australia;The Centre for Quantum Computation & Intelligent Systems, FEIT, University of Technology, Sydney, NSW, Australia
Venue:
ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I
Year:
2011

Citing 14
Cited 3

Bagging predictors

Machine Learning
Machine Learning for the Detection of Oil Spills in Satellite Radar Images

Machine Learning - Special issue on applications of machine learning and the knowledge discovery process
An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization

Machine Learning
Robust Classification for Imprecise Environments

Machine Learning
An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants

Machine Learning
The Case against Accuracy Estimation for Comparing Induction Algorithms

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
An introduction to ROC analysis

Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
An Evaluation of Progressive Sampling for Imbalanced Data Sets

ICDMW '06 Proceedings of the Sixth IEEE International Conference on Data Mining - Workshops
Learning when training data are costly: the effect of class distribution on tree induction

Journal of Artificial Intelligence Research
AUC: a better measure than accuracy in comparing learning algorithms

AI'03 Proceedings of the 16th Canadian society for computational studies of intelligence conference on Advances in artificial intelligence
Bagging, boosting, and C4.S

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
An empirical study of bagging predictors for imbalanced data with different levels of class distribution

AI'11 Proceedings of the 24th international conference on Advances in Artificial Intelligence

An efficient and simple under-sampling technique for imbalanced time series classification

Proceedings of the 21st ACM international conference on Information and knowledge management
A comparative study of sampling methods and algorithms for imbalanced time series classification

AI'12 Proceedings of the 25th Australasian joint conference on Advances in Artificial Intelligence
Empirical study of bagging predictors on medical data

AusDM '11 Proceedings of the Ninth Australasian Data Mining Conference - Volume 121

Quantified Score

Hi-index	0.00

Visualization

Abstract

This study investigates the effectiveness of bagging with respect to different learning algorithms on Imbalanced data-sets. The purpose of this research is to investigate the performance of bagging based on two unique approaches: (1) classify base learners with respect to 12 different learning algorithms in general terms, and (2) evaluate the performance of bagging predictors on data with imbalanced class distributions. The former approach develops a method to categorize base learners by using two-dimensional robustness and stability decomposition on 48 benchmark data-sets; while the latter approach investigates the performance of bagging predictors by using evaluation metrics, True Positive Rate (TPR ), Geometric mean (G-mean ) for the accuracy on the majority and minority classes, and the Receiver Operating Characteristic (ROC ) curve on 12 imbalanced data-sets. Our studies assert that both stability and robustness are important factors for building high performance bagging predictors on data with imbalanced class distributions. The experimental results demonstrated that PART and Multi-layer Proceptron (MLP) are the learning algorithms with the best bagging performance on 12 imbalanced data-sets. Moreover, only four out of 12 bagging predictors are statistically superior to single learners based on both G-mean and TPR evaluation metrics over 12 imbalanced data-sets.