Bagging decision trees on data sets with classification noise

  • Authors:
  • Joaquín Abellán;Andrés R. Masegosa

  • Affiliations:
  • Department of Computer Science and Artificial Intelligence, University of Granada, Spain;Department of Computer Science and Artificial Intelligence, University of Granada, Spain

  • Venue:
  • FoIKS'10 Proceedings of the 6th international conference on Foundations of Information and Knowledge Systems
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In many of the real applications of supervised classification techniques, the data sets employed to learn the models contains classification noise (some instances of the data set have wrong assignations of the class label), principally due to deficiencies in the data capture process. Bagging ensembles of decision trees are considered to be one of the most outperforming supervised classification models in these situations. In this paper, we propose Bagging ensemble of credal decision trees, which are based on imprecise probabilities, via the Imprecise Dirichlet model, and information based uncertainty measures, via the maximum of entropy function. We remark that our method can be applied on data sets with continuous variables and missing data. With an experimental study, we prove that Bagging credal decision trees outperforms more complex Bagging approaches in data sets with classification noise. Furthermore, using a bias-variance error decomposition analysis, we also justify the performance of our approach showing that it achieves a stronger and more robust reduction of the variance error component.