Influence of Hyperparameters on Random Forest Accuracy

Authors:
Simon Bernard;Laurent Heutte;Sébastien Adam
Affiliations:
Université de Rouen, LITIS EA 4108, Saint-Etienne du Rouvray, France 76801;Université de Rouen, LITIS EA 4108, Saint-Etienne du Rouvray, France 76801;Université de Rouen, LITIS EA 4108, Saint-Etienne du Rouvray, France 76801
Venue:
MCS '09 Proceedings of the 8th International Workshop on Multiple Classifier Systems
Year:
2009

Citing 6
Cited 5

Bagging predictors

Machine Learning
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
Application of distributed SVM architectures in classifying forest data cover types

Computers and Electronics in Agriculture
Multiple classifier systems in remote sensing: from basics to recent developments

MCS'07 Proceedings of the 7th international conference on Multiple classifier systems
EUS SVMs: ensemble of under-sampled SVMs for data imbalance problems

ICONIP'06 Proceedings of the 13 international conference on Neural Information Processing - Volume Part I
z-SVM: an SVM for improved classification of imbalanced data

AI'06 Proceedings of the 19th Australian joint conference on Artificial Intelligence: advances in Artificial Intelligence

A comparison of random forest with ECOC-based classifiers

MCS'11 Proceedings of the 10th international conference on Multiple classifier systems
Dynamic Random Forests

Pattern Recognition Letters
A new random forest method for one-class classification

SSPR'12/SPR'12 Proceedings of the 2012 Joint IAPR international conference on Structural, Syntactic, and Statistical Pattern Recognition
How large should ensembles of classifiers be?

Pattern Recognition
One class random forests

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we present our work on the Random Forest (RF) family of classification methods. Our goal is to go one step further in the understanding of RF mechanisms by studying the parametrization of the reference algorithm Forest-RI. In this algorithm, a randomization principle is used during the tree induction process, that randomly selects K features at each node, among which the best split is chosen. The strength of randomization in the tree induction is thus led by the hyperparameter K which plays an important role for building accurate RF classifiers. We have decided to focus our experimental study on this hyperparameter and on its influence on classification accuracy. For that purpose, we have evaluated the Forest-RI algorithm on several machine learning problems and with different settings of K in order to understand the way it acts on RF performance. We show that default values of K traditionally used in the literature are globally near-optimal, except for some cases for which they are all significatively sub-optimal. Thus additional experiments have been led on those datasets, that highlight the crucial role played by feature relevancy in finding the optimal setting of K .