Variable selection using random forests

Authors:
Robin Genuer;Jean-Michel Poggi;Christine Tuleau-Malot
Affiliations:
Laboratoire de Mathématiques, Université Paris-Sud 11, Bít. 425, 91405 Orsay, France;Laboratoire de Mathématiques, Université Paris-Sud 11, Bít. 425, 91405 Orsay, France and Université Paris 5 Descartes, France;Laboratoire Jean-Alexandre Dieudonné, Université Nice Sophia-Antipolis, Parc Valrose, 06108 Nice Cedex 02, France
Venue:
Pattern Recognition Letters
Year:
2010

Citing 11
Cited 8

Bagging predictors

Machine Learning
Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization

Machine Learning
Random Forests

Machine Learning
Gene Selection for Cancer Classification using Support Vector Machines

Machine Learning
Ensemble Methods in Machine Learning

MCS '00 Proceedings of the First International Workshop on Multiple Classifier Systems
An introduction to variable and feature selection

The Journal of Machine Learning Research
Variable selection using svm based criteria

The Journal of Machine Learning Research
Use of the zero norm with linear models and kernel methods

The Journal of Machine Learning Research
Empirical characterization of random forest variable importance measures

Computational Statistics & Data Analysis
Consistency of Random Forests and Other Averaging Classifiers

The Journal of Machine Learning Research

Separating the wheat from the chaff: on feature selection and feature importance in regression random forests and symbolic regression

Proceedings of the 13th annual conference companion on Genetic and evolutionary computation
A comparison of eight metamodeling techniques for the simulation of N2O fluxes and N leaching from corn crops

Environmental Modelling & Software
A hybrid KMV model, random forests and rough set theory approach for credit rating

Knowledge-Based Systems
Analysis of a random forests model

The Journal of Machine Learning Research
Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
A new variable selection approach using Random Forests

Computational Statistics & Data Analysis
Feature subset selection Filter-Wrapper based on low quality data

Expert Systems with Applications: An International Journal
Feature selection for high-dimensional multi-category data using PLS-based local recursive feature elimination

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.10

Visualization

Abstract

This paper proposes, focusing on random forests, the increasingly used statistical method for classification and regression problems introduced by Leo Breiman in 2001, to investigate two classical issues of variable selection. The first one is to find important variables for interpretation and the second one is more restrictive and try to design a good parsimonious prediction model. The main contribution is twofold: to provide some experimental insights about the behavior of the variable importance index based on random forests and to propose a strategy involving a ranking of explanatory variables using the random forests score of importance and a stepwise ascending variable introduction strategy.