Statistical absolute evaluation of gene ontology terms with gene expression data

  • Authors:
  • Pramod K. Gupta;Ryo Yoshida;Seiya Imoto;Rui Yamaguchi;Satoru Miyano

  • Affiliations:
  • Human Genome Center, Institute of Medical Science, University of Tokyo, Tokyo, Japan;Human Genome Center, Institute of Medical Science, University of Tokyo, Tokyo, Japan;Human Genome Center, Institute of Medical Science, University of Tokyo, Tokyo, Japan;Human Genome Center, Institute of Medical Science, University of Tokyo, Tokyo, Japan;Human Genome Center, Institute of Medical Science, University of Tokyo, Tokyo, Japan

  • Venue:
  • ISBRA'07 Proceedings of the 3rd international conference on Bioinformatics research and applications
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a new testing procedure for the automatic ontological analysis of gene expression data. The objective of the ontological analysis is to retrieve some functional annotations, e.g. Gene Ontology terms, relevant to underlying cellular mechanisms behind the gene expression profiles, and currently, a large number of tools have been developed for this purpose. The most existing tools implement the same approach that exploits rank statistics of the genes which are ordered by the strength of statistical evidences, e.g. p-values computed by testing hypotheses at the individual gene level. However, such an approach often causes the serious false discovery. Particularly, one of the most crucial drawbacks is that the rank-based approaches wrongly judge the ontology term as statistically significant although all of the genes annotated by the ontology term are irrelevant to the underlying cellular mechanisms. In this paper, we first point out some drawbacks of the rank-based approaches from the statistical point of view, and then, propose a new testing procedure in order to overcome the drawbacks. The method that we propose has the theoretical basis on the statistical meta-analysis, and the hypothesis to be tested is suitably stated for the problem of the ontological analysis. We perform Monte Carlo experiments for highlighting the disadvantages of the rank-based approach and the advantages of the proposed method. Finally, we demonstrate the applicability of the proposed method along with the ontological analysis of the gene expression data of human diabetes.