Clustering based bagging algorithm on imbalanced data sets

  • Authors:
  • Xiao-Yan Sun;Hua-Xiang Zhang;Zhi-Chao Wang

  • Affiliations:
  • Department of Information Science and Engineering, Shandong Normal University and Shandong Provincial Key Laboratory for Distributed Computer Software Novel Technology, Jinan, Shandong China;Department of Information Science and Engineering, Shandong Normal University and Shandong Provincial Key Laboratory for Distributed Computer Software Novel Technology, Jinan, Shandong China;Department of Information Science and Engineering, Shandong Normal University and Shandong Provincial Key Laboratory for Distributed Computer Software Novel Technology, Jinan, Shandong China

  • Venue:
  • IUKM'11 Proceedings of the 2011 international conference on Integrated uncertainty in knowledge modelling and decision making
  • Year:
  • 2011

Quantified Score

Hi-index 0.01

Visualization

Abstract

The approach of under-sampling the majority class is an effective method in dealing with classifying imbalanced data sets, but it has the deficiency of ignoring useful information. In order to eliminate this deficiency, we propose a Clustering Based Bagging Algorithm (CBBA). In CBBA, the majority class is clustered into several groups and instances are randomly sampled from each group. Those sampled instances are combined together with the minority class instances, and are used to train a base classifier. Final predictions are produced by combining those classifiers. The experimental results show that our approach outperforms the under-sampling method.