A high-dimensional two-sample test for the mean using random subspaces

Authors:
Måns Thulin
Affiliations:
-
Venue:
Computational Statistics & Data Analysis
Year:
2014

Citing 13
Cited 0

Random projection in dimensionality reduction: applications to image and text data

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Significance analysis of functional categories in gene expression studies: a structured permutation approach

Bioinformatics
Random subspace method for multivariate feature selection

Pattern Recognition Letters
A test for the equality of covariance matrices when the dimension is large relative to the sample sizes

Computational Statistics & Data Analysis
Analyzing gene expression data in terms of gene sets

Bioinformatics
A test for the mean vector with fewer observations than the dimension

Journal of Multivariate Analysis
Identification of differentially expressed gene categories in microarray studies using nonparametric multivariate analysis

Bioinformatics
Letters: Bio-molecular cancer prediction with random subspace ensembles of support vector machines

Neurocomputing
Using randomized projection techniques to aid in detecting high-dimensional malicious applications

Proceedings of the 49th Annual Southeast Regional Conference
Resistant estimates for high dimensional and functional data based on random projections

Computational Statistics & Data Analysis
A two sample test in high dimensional data

Journal of Multivariate Analysis
Using random subspace method for prediction and variable importance assessment in linear regression

Computational Statistics & Data Analysis
RcppArmadillo: Accelerating R with high-performance C++ linear algebra

Computational Statistics & Data Analysis

Quantified Score

Hi-index	0.03

Visualization

Abstract

A common problem in genetics is that of testing whether a set of highly dependent gene expressions differ between two populations, typically in a high-dimensional setting where the data dimension is larger than the sample size. Most high-dimensional tests for the equality of two mean vectors rely on naive diagonal or trace estimators of the covariance matrix, ignoring dependences between variables. A test using random subspaces is proposed, which offers higher power when the variables are dependent and is invariant under linear transformations of the marginal distributions. The p-values for the test are obtained using permutations. The test does not rely on assumptions about normality or the structure of the covariance matrix. It is shown by simulation that the new test has higher power than competing tests in realistic settings motivated by microarray gene expression data. Computational aspects of high-dimensional permutation tests are also discussed and an efficient R implementation of the proposed test is provided.