Consensus Decision Trees: Using Consensus Hierarchical Clustering for Data Relabelling and Reduction

  • Authors:
  • Branko Kavsek;Nada Lavrac;Anuska Ferligoj

  • Affiliations:
  • -;-;-

  • Venue:
  • EMCL '01 Proceedings of the 12th European Conference on Machine Learning
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

In data analysis, induction of decision trees serves two main goals: first, induced decision trees can be used for classification /prediction of new instances, and second, they represent an easy-to-interpret model of the problem domain that can be used for explanation. The accuracy of the induced classifier is usually estimated using N-fold cross validation, whereas for explanation purposes a decision tree induced from all the available data is used. Decision tree learning is relatively non-robust: a small change in the training set may significantly change the structure of the induced decision tree. This paper presents a decision tree construction method in which the domain model is constructed by consensus clustering of N decision trees induced in N-fold cross-validation. Experimental results show that consensus decision trees are simpler than C4.5 decision trees, indicating that they may be a more stable approximation of the intended domain model than decision tree, constructed from the entire set of training instances.