Gene ranking using bootstrapped P-values

  • Authors:
  • S. N. Mukherjee;S. J. Roberts;P. Sykacek;S. J. Gurr

  • Affiliations:
  • University of Oxford, Oxford, U.K.;University of Oxford, Oxford, U.K.;University of Oxford, Oxford, U.K.;University of Oxford, Oxford, U.K.

  • Venue:
  • ACM SIGKDD Explorations Newsletter
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recent research has shown that it is possible to find genes involved in the pathogenesis of a particular condition on the basis of microarray experiments. Genes which are differentially expressed, for example between healthy and diseased tissues, are likely to be relevant to the disease under study. Some of the properties of microarray datasets make the task of finding these genes a challenging one. This paper proposes a gene-ranking algorithm whose main novelty is the use of bootstrapped P-values. We present an analysis of the algorithm, showing how it takes account of small-sample variability in observed values of the test statistic, in a way conventional statistical tests cannot. Experimental results show that our algorithm outperforms the widely-used two-sample t-test on challenging artificial data. Gene ranking is then performed on two well-known microarray datasets, with encouraging results. For example, a number of genes from one of the datasets, whose differential expression was subsequently confirmed by a more reliable biochemical analysis, are found to be ranked higher by the bootstrapped algorithm than by the conventional t-test, suggesting that the proposed algorithm may be better able to exploit the limited data available to infer biologically useful information.