Learning through Changes: An Empirical Study of Dynamic Behaviors of Probability Estimation Trees

Authors:
Kun Zhang;Zujia Xu;Jing Peng;Bill Buckles
Affiliations:
Tulane University;Dillard University;Tulane University;Tulane University
Venue:
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Year:
2005

Citing 7
Cited 1

C4.5: programs for machine learning

C4.5: programs for machine learning
Pruning Decision Trees with Misclassification Costs

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Tree Induction for Probability-Based Ranking

Machine Learning
Is random model better? On its accuracy and efficiency

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
On the optimality of probability estimation by random decision trees

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Learning when training data are costly: the effect of class distribution on tree induction

Journal of Artificial Intelligence Research
Maximizing tree diversity by building complete-random decision trees

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining

A general framework for accurate and fast regression by data summarization in random decision trees

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.01

Visualization

Abstract

In practice, learning from data is often hampered by the limited training examples. In this paper, as the size of training data varies, we empirically investigate several probability estimation tree algorithms over eighteen binary classification problems. Nine metrics are used to evaluate their performances. Our aggregated results show that ensemble trees consistently outperform single trees. Confusion factor trees(CFT) register poor calibration even as training size increases, which shows that CFTs are potentially biased if data sets have small noise. We also provide analysis on the observed performance of the tree algorithms.