Ensembles of decision trees for imbalanced data

  • Authors:
  • Juan J. Rodríguez;José F. Díez-Pastor;César García-Osorio

  • Affiliations:
  • University of Burgos, Spain;University of Burgos, Spain;University of Burgos, Spain

  • Venue:
  • MCS'11 Proceedings of the 10th international conference on Multiple classifier systems
  • Year:
  • 2011

Quantified Score

Hi-index 0.01

Visualization

Abstract

Ensembles of decision trees are considered for imbalanced datasets. Conventional decision trees (C4.5) and trees for imbalanced data (CCPDT: Class Confidence Proportion Decision Tree) are used as base classifiers. Ensemble methods, based on undersampling and oversampling, for imbalanced data are considered. Conventional ensemble methods, not specific for imbalanced data, are also studied: Bagging, Random Subspaces, AdaBoost, Real AdaBoost, MultiBoost and Rotation Forest. The results show that the ensemble method is much more important that the type of decision trees used as base classifier. Rotation Forest is the ensemble method with the best results. For the decision tree methods, CCPDT shows no advantage.