Machine learning, neural and statistical classification
Machine learning, neural and statistical classification
Automatic Learning Techniques in Power Systems
Automatic Learning Techniques in Power Systems
On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality
Data Mining and Knowledge Discovery
Investigation and Reduction of Discretization Variance in Decision Tree Induction
ECML '00 Proceedings of the 11th European Conference on Machine Learning
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Hi-index | 0.00 |
This paper investigates enhancements of decision tree bagging which mainly aim at improving computation times, but also accuracy. The three questions which are reconsidered are: discretization of continuous attributes, tree pruning, and sampling schemes. A very simple discretization procedure is proposed, resulting in a dramatic speedup without significant decrease in accuracy. Then a new method is proposed to prune an ensemble of trees in a combined fashion, which is significantly more effective than individual pruning. Finally, different resampling schemes are considered leading to different CPU time/accuracy tradeoffs. Combining all these enhancements makes it possible to apply tree bagging to very large datasets, with computational performances similar to single tree induction. Simulations are carried out on two synthetic databases and four real-life datasets.