Data-adaptive test statistics for microarray data

Authors:
Sach Mukherjee;Stephen J. Roberts;Mark J. Van Der Laan
Affiliations:
Department of Engineering Science, University of Oxford UK;Department of Engineering Science, University of Oxford UK;Division of Biostatistics, School of Public Health, University of California Berkeley, USA
Venue:
Bioinformatics
Year:
2005

Citing 0
Cited 5

Reproducibility-Optimized Test Statistic for Ranking Genes in Microarray Studies

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Fuzzy-Adaptive-Subspace-Iteration-Based Two-Way Clustering of Microarray Data

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Interval based fuzzy systems for identification of important genes from microarray gene expression data: Application to carcinogenic development

Journal of Biomedical Informatics
Ranking function based on higher order statistics (RF-HOS)for two-sample microarray experiments

ISBRA'07 Proceedings of the 3rd international conference on Bioinformatics research and applications
Evaluation of supervised and unsupervised 3D star visualisation algorithms

International Journal of Data Mining and Bioinformatics

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: An important task in microarray data analysis is the selection of genes that are differentially expressed between different tissue samples, such as healthy and diseased. However, microarray data contain an enormous number of dimensions (genes) and very few samples (arrays), a mismatch which poses fundamental statistical problems for the selection process that have defied easy resolution. Results: In this paper, we present a novel approach to the selection of differentially expressed genes in which test statistics are learned from data using a simple notion of reproducibility in selection results as the learning criterion. Reproducibility, as we define it, can be computed without any knowledge of the 'ground-truth', but takes advantage of certain properties of microarray data to provide an asymptotically valid guide to expected loss under the true data-generating distribution. We are therefore able to indirectly minimize expected loss, and obtain results substantially more robust than conventional methods. We apply our method to simulated and oligonucleotide array data. Availability: By request to the corresponding author. Contact: sach@robots.ox.ac.uk