C4.5: programs for machine learning
C4.5: programs for machine learning
Pruning Decision Trees with Misclassification Costs
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Tree Induction for Probability-Based Ranking
Machine Learning
Is random model better? On its accuracy and efficiency
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
On the optimality of probability estimation by random decision trees
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Learning when training data are costly: the effect of class distribution on tree induction
Journal of Artificial Intelligence Research
Maximizing tree diversity by building complete-random decision trees
PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
A general framework for accurate and fast regression by data summarization in random decision trees
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Hi-index | 0.01 |
In practice, learning from data is often hampered by the limited training examples. In this paper, as the size of training data varies, we empirically investigate several probability estimation tree algorithms over eighteen binary classification problems. Nine metrics are used to evaluate their performances. Our aggregated results show that ensemble trees consistently outperform single trees. Confusion factor trees(CFT) register poor calibration even as training size increases, which shows that CFTs are potentially biased if data sets have small noise. We also provide analysis on the observed performance of the tree algorithms.