On the selection of decision trees in random forests

Authors:
Simon Bernard;Laurent Heutte;Sébastien Adam
Affiliations:
LITIS EA 4108, University of Rouen, Saint-Etienne du Rouvray, France;LITIS EA 4108, University of Rouen, Saint-Etienne du Rouvray, France;LITIS EA 4108, University of Rouen, Saint-Etienne du Rouvray, France
Venue:
IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Year:
2009

Citing 10
Cited 1

Bagging predictors

Machine Learning
Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
The Random Subspace Method for Constructing Decision Forests

IEEE Transactions on Pattern Analysis and Machine Intelligence
Random Forests

Machine Learning
Comparison of Genetic Algorithm and Sequential Search Methods for Classifier Subset Selection

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Combining Pattern Classifiers: Methods and Algorithms

Combining Pattern Classifiers: Methods and Algorithms
Fast Branch & Bound Algorithms for Optimal Feature Selection

IEEE Transactions on Pattern Analysis and Machine Intelligence
Extremely randomized trees

Machine Learning
Rotation Forest: A New Classifier Ensemble Method

IEEE Transactions on Pattern Analysis and Machine Intelligence
Using Random Forests for Handwritten Digit Recognition

ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 02

Editorial: Modifications of the construction and voting mechanisms of the Random Forests Algorithm

Data & Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we present a study on the Random Forest (RF) family of ensemble methods. In a "classical" RF induction process a fixed number of randomized decision trees are inducted to form an ensemble. This kind of algorithm presents two main drawbacks: (i) the number of trees has to be fixed a priori (ii) the interpretability and analysis capacities offered by decision tree classifiers are lost due to the randomization principle. This kind of process in which trees are independently added to the ensemble, offers no guarantee that all those trees will cooperate effectively in the same committee. This statement rises two questions: are there any decision trees in a RF that provide the deterioration of ensem ble performance? If so, is it possible to form a more accurate committee via removal of decision trees with poor performance? The answer to these questions is tackled as a classifier selection problem. We thus show that better subsets of decision trees can be obtained even using a sub-optimal classifier selection method. This proves that "classical" RF induction process, for which randomized trees are arbitrary added to the ensemble, is not the best approach to produce accurate RF classifiers. We also show the interest in designing RF by adding trees in a more dependent way than it is traditionally done in "classical" RF induction algorithms.