Clustering based bagging algorithm on imbalanced data sets

Authors:
Xiao-Yan Sun;Hua-Xiang Zhang;Zhi-Chao Wang
Affiliations:
Department of Information Science and Engineering, Shandong Normal University and Shandong Provincial Key Laboratory for Distributed Computer Software Novel Technology, Jinan, Shandong China;Department of Information Science and Engineering, Shandong Normal University and Shandong Provincial Key Laboratory for Distributed Computer Software Novel Technology, Jinan, Shandong China;Department of Information Science and Engineering, Shandong Normal University and Shandong Provincial Key Laboratory for Distributed Computer Software Novel Technology, Jinan, Shandong China
Venue:
IUKM'11 Proceedings of the 2011 international conference on Integrated uncertainty in knowledge modelling and decision making
Year:
2011

Citing 14
Cited 0

Support Vector Machines for Classification in Nonstandard Situations

Machine Learning
AdaCost: Misclassification Cost-Sensitive Boosting

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Cost-Sensitive Learning by Cost-Proportionate Example Weighting

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Mining with rarity: a unifying framework

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Class imbalances versus small disjuncts

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
KBA: Kernel Boundary Alignment Considering Imbalanced Data Distribution

IEEE Transactions on Knowledge and Data Engineering
Asymmetric Bagging and Random Subspace for Support Vector Machines-Based Relevance Feedback in Image Retrieval

IEEE Transactions on Pattern Analysis and Machine Intelligence
Exploratory Under-Sampling for Class-Imbalance Learning

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Classifying imbalanced data using a bagging ensemble variation (BEV)

ACM-SE 45 Proceedings of the 45th annual southeast regional conference
Cost-sensitive boosting for classification of imbalanced data

Pattern Recognition
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
Cluster-Based sampling approaches to imbalanced data distributions

DaWaK'06 Proceedings of the 8th international conference on Data Warehousing and Knowledge Discovery
Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning

ICIC'05 Proceedings of the 2005 international conference on Advances in Intelligent Computing - Volume Part I

Quantified Score

Hi-index	0.01

Visualization

Abstract

The approach of under-sampling the majority class is an effective method in dealing with classifying imbalanced data sets, but it has the deficiency of ignoring useful information. In order to eliminate this deficiency, we propose a Clustering Based Bagging Algorithm (CBBA). In CBBA, the majority class is clustered into several groups and instances are randomly sampled from each group. Those sampled instances are combined together with the minority class instances, and are used to train a base classifier. Final predictions are produced by combining those classifiers. The experimental results show that our approach outperforms the under-sampling method.