Using random subspace method for prediction and variable importance assessment in linear regression

Authors:
Jan Mielniczuk;Paweł Teisseyre
Affiliations:
Institute of Computer Science, Polish Academy of Sciences, Poland and Warsaw University of Technology, Faculty of Mathematics and Information Science, Poland;Institute of Computer Science, Polish Academy of Sciences, Poland
Venue:
Computational Statistics & Data Analysis
Year:
2014

Citing 7
Cited 1

The Random Subspace Method for Constructing Decision Forests

IEEE Transactions on Pattern Analysis and Machine Intelligence
Random Forests

Machine Learning
Ranking a random feature for variable and feature selection

The Journal of Machine Learning Research
Random subspace method for multivariate feature selection

Pattern Recognition Letters
Monte Carlo feature selection for supervised classification

Bioinformatics
Analysis and correction of bias in Total Decrease in Node Impurity measures for tree-based algorithms

Statistics and Computing
Modified versions of Bayesian Information Criterion for genome-wide association studies

Computational Statistics & Data Analysis

A high-dimensional two-sample test for the mean using random subspaces

Computational Statistics & Data Analysis

Quantified Score

Hi-index	0.03

Visualization

Abstract

A random subset method (RSM) with a new weighting scheme is proposed and investigated for linear regression with a large number of features. Weights of variables are defined as averages of squared values of pertaining t-statistics over fitted models with randomly chosen features. It is argued that such weighting is advisable as it incorporates two factors: a measure of importance of the variable within the considered model and a measure of goodness-of-fit of the model itself. Asymptotic weights assigned by such a scheme are determined as well as assumptions under which the method leads to consistent choice of significant variables in the model. Numerical experiments indicate that the proposed method behaves promisingly when its prediction errors are compared with errors of penalty-based methods such as the lasso and it has much smaller false discovery rate than the other methods considered.