A comparison of approaches for learning probability trees

  • Authors:
  • Daan Fierens;Jan Ramon;Hendrik Blockeel;Maurice Bruynooghe

  • Affiliations:
  • Department of Computer Science, Katholieke Universiteit Leuven, Leuven, Belgium;Department of Computer Science, Katholieke Universiteit Leuven, Leuven, Belgium;Department of Computer Science, Katholieke Universiteit Leuven, Leuven, Belgium;Department of Computer Science, Katholieke Universiteit Leuven, Leuven, Belgium

  • Venue:
  • ECML'05 Proceedings of the 16th European conference on Machine Learning
  • Year:
  • 2005

Quantified Score

Hi-index 0.01

Visualization

Abstract

Probability trees (or Probability Estimation Trees, PET's) are decision trees with probability distributions in the leaves. Several alternative approaches for learning probability trees have been proposed but no thorough comparison of these approaches exists. In this paper we experimentally compare the main approaches using the relational decision tree learner Tilde (both on non-relational and on relational datasets). Next to the main existing approaches, we also consider a novel variant of an existing approach based on the Bayesian Information Criterion (BIC). Our main conclusion is that overall trees built using the C4.5-approach or the C4.4-approach (C4.5 without post-pruning) have the best predictive performance. If the number of classes is low, however, BIC performs equally well. An additional advantage of BIC is that its trees are considerably smaller than trees for the C4.5- or C4.4-approach.