Some Enhencements of Decision Tree Bagging

Authors:
Pierre Geurts
Affiliations:
-
Venue:
PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Year:
2000

Citing 7
Cited 0

Machine learning, neural and statistical classification

Machine learning, neural and statistical classification
Automatic Learning Techniques in Power Systems

Automatic Learning Techniques in Power Systems
On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality

Data Mining and Knowledge Discovery
Pasting Small Votes for Classification in Large Databases and On-Line

Machine Learning
An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants

Machine Learning
Investigation and Reduction of Discretization Variance in Decision Tree Induction

ECML '00 Proceedings of the 11th European Conference on Machine Learning
Pruning Adaptive Boosting

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper investigates enhancements of decision tree bagging which mainly aim at improving computation times, but also accuracy. The three questions which are reconsidered are: discretization of continuous attributes, tree pruning, and sampling schemes. A very simple discretization procedure is proposed, resulting in a dramatic speedup without significant decrease in accuracy. Then a new method is proposed to prune an ensemble of trees in a combined fashion, which is significantly more effective than individual pruning. Finally, different resampling schemes are considered leading to different CPU time/accuracy tradeoffs. Combining all these enhancements makes it possible to apply tree bagging to very large datasets, with computational performances similar to single tree induction. Simulations are carried out on two synthetic databases and four real-life datasets.