A new variable selection approach using Random Forests

Authors:
A. Hapfelmeier;K. Ulm
Affiliations:
-;-
Venue:
Computational Statistics & Data Analysis
Year:
2013

Citing 11
Cited 3

Technical Note: Bias in Information-Based Measures in Decision Tree Induction

Machine Learning
Bagging predictors

Machine Learning
Random Forests

Machine Learning
Bias Correction in Classification Tree Construction

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
An introduction to variable and feature selection

The Journal of Machine Learning Research
Empirical characterization of random forest variable importance measures

Computational Statistics & Data Analysis
Survival prediction using gene expression data: A review and comparison

Computational Statistics & Data Analysis
Maximal conditional chi-square importance in random forests

Bioinformatics
Permutation importance

Bioinformatics
Variable selection using random forests

Pattern Recognition Letters
Modern Applied Statistics with S

Modern Applied Statistics with S

Stable graphical model estimation with Random Forests for discrete, continuous, and mixed variables

Computational Statistics & Data Analysis
Simultaneous sample and gene selection using t-score and approximate support vectors

PRIB'13 Proceedings of the 8th IAPR international conference on Pattern Recognition in Bioinformatics
Classification with decision trees from a nonparametric predictive inference perspective

Computational Statistics & Data Analysis

Quantified Score

Hi-index	0.03

Visualization

Abstract

Random Forests are frequently applied as they achieve a high prediction accuracy and have the ability to identify informative variables. Several approaches for variable selection have been proposed to combine and intensify these qualities. An extensive review of the corresponding literature led to the development of a new approach that is based on the theoretical framework of permutation tests and meets important statistical properties. A comparison to another eight popular variable selection methods in three simulation studies and four real data applications indicated that: the new approach can also be used to control the test-wise and family-wise error rate, provides a higher power to distinguish relevant from irrelevant variables and leads to models which are located among the very best performing ones. In addition, it is equally applicable to regression and classification problems.