Empirical study of bagging predictors on medical data

Authors:
Guohua Liang;Chengqi Zhang
Affiliations:
University of Technology, Sydney NSW Australia;University of Technology, Sydney NSW Australia
Venue:
AusDM '11 Proceedings of the Ninth Australasian Data Mining Conference - Volume 121
Year:
2011

Citing 24
Cited 2

Bagging predictors

Machine Learning
An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization

Machine Learning
Robust Classification for Imprecise Environments

Machine Learning
An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants

Machine Learning
Induction of Decision Trees

Machine Learning
Minority report in fraud detection: classification of skewed data

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Neural network ensemble strategies for financial decision applications

Computers and Operations Research
An introduction to ROC analysis

Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
An Evaluation of Progressive Sampling for Imbalanced Data Sets

ICDMW '06 Proceedings of the Sixth IEEE International Conference on Data Mining - Workshops
An Evaluation of the Robustness of MTS for Imbalanced Data

IEEE Transactions on Knowledge and Data Engineering
On the Class Imbalance Problem

ICNC '08 Proceedings of the 2008 Fourth International Conference on Natural Computation - Volume 04
An Empirical Study of Combined Classifiers for Knowledge Discovery on Medical Data Bases

Advanced Web and NetworkTechnologies, and Applications
Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling TEchnique for Handling the Class Imbalanced Problem

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Learning from Imbalanced Data

IEEE Transactions on Knowledge and Data Engineering
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
Learning when training data are costly: the effect of class distribution on tree induction

Journal of Artificial Intelligence Research
Ensemble with neural networks for bankruptcy prediction

Expert Systems with Applications: An International Journal
Bagging, boosting, and C4.S

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
Class confidence weighted kNN algorithms for imbalanced data sets

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning

ICIC'05 Proceedings of the 2005 international conference on Advances in Intelligent Computing - Volume Part I
An empirical study of bagging predictors for imbalanced data with different levels of class distribution

AI'11 Proceedings of the 24th international conference on Advances in Artificial Intelligence
An empirical evaluation of bagging with different algorithms on imbalanced data

ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I

An approach for Ewing test selection to support the clinical assessment of cardiac autonomic neuropathy

Artificial Intelligence in Medicine
Improving classifications for cardiac autonomic neuropathy using multi-level ensemble classifiers and feature selection based on random forest

AusDM '12 Proceedings of the Tenth Australasian Data Mining Conference - Volume 134

Quantified Score

Hi-index	0.00

Visualization

Abstract

This study investigates the performance of bagging in terms of learning from imbalanced medical data. It is important for data miners to achieve highly accurate prediction models, and this is especially true for imbalanced medical applications. In these situations, practitioners are more interested in the minority class than the majority class; however, it is hard for a traditional supervised learning algorithm to achieve a highly accurate prediction on the minority class, even though it might achieve better results according to the most commonly used evaluation metric, Accuracy. Bagging is a simple yet effective ensemble method which has been applied to many real-world applications. However, some questions have not been well answered, e.g., whether bagging outperforms single learners on medical data-sets; which learners are the best predictors for each medical data-set; and what is the best predictive performance achievable for each medical data-set when we apply sampling techniques. We perform an extensive empirical study on the performance of 12 learning algorithms on 8 medical data-sets based on four performance measures: True Positive Rate (TPR), True Negative Rate (TNR), Geometric Mean (G-mean) of the accuracy rate of the majority class and the minority class, and Accuracy as evaluation metrics. In addition, the statistical analyses performed instil confidence in the validity of the conclusions of this research.