RDCurve: A Nonparametric Method to Evaluate the Stability of Ranking Procedures

Authors:
Xin Lu;Anthony Gamst;Ronghui Xu
Affiliations:
University of California, San Diego, CA;University of California, San Diego, CA;University of California, San Diego, CA
Venue:
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Year:
2010

Citing 4
Cited 1

Random Forests

Machine Learning
Is cross-validation better than resubstitution for ranking genes?

Bioinformatics
Comparative analysis of algorithms for signal quantitation from oligonucleotide microarrays

Bioinformatics
Significance of Gene Ranking for Classification of Microarray Samples

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

A Top-r Feature Selection Algorithm for Microarray Gene Expression Data

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Great concerns have been raised about the reproducibility of gene signatures based on high-throughput techniques such as microarray. Studies analyzing similar samples often report poorly overlapping results, and the p-value usually lacks biological context. We propose a nonparametric ReDiscovery Curve (RDCurve) method, to estimate the frequency of rediscovery of gene signature identified. Given a ranking procedure and a data set with replicated measurements, the RDCurve bootstraps the data set and repeatedly applies the ranking procedure, selects a subset of k important genes, and estimates the probability of rediscovery of the selected subset of genes. We also propose a permutation scheme to estimate the confidence band under the Null hypothesis for the significance of the RDCurve. The method is nonparametric and model-independent. With the RDCurve, we can assess the signal-to-noise ratio of the data, compare the performance of ranking procedures in term of their expected rediscovery rates, and choose the number of genes to be reported.