Exact score distribution computation for similarity searches in ontologies

  • Authors:
  • Marcel H. Schulz;Sebastian Köhler;Sebastian Bauer;Martin Vingron;Peter N. Robinson

  • Affiliations:
  • Max Planck Institute for Molecular Genetics, Berlin, Germany and International Max Planck Research School for Computational Biology and Scientific Computing, Berlin, Germany;Institute for Medical Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany and Berlin-Brandenburg Center for Regenerative Therapies, Charité-Universitätsmedizin Berlin ...;Institute for Medical Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany;Max Planck Institute for Molecular Genetics, Berlin, Germany;Institute for Medical Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany and Berlin-Brandenburg Center for Regenerative Therapies, Charité-Universitätsmedizin Berlin ...

  • Venue:
  • WABI'09 Proceedings of the 9th international conference on Algorithms in bioinformatics
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Semantic similarity searches in ontologies are an important component of many bioinformatic algorithms, e.g., protein function prediction with the Gene Ontology. In this paper we consider the exact computation of score distributions for similarity searches in ontologies, and introduce a simple null hypothesis which can be used to compute a P-value for the statistical significance of similarity scores. We concentrate on measures based on Resnik's definition of ontological similarity. A new algorithm is proposed that collapses subgraphs of the ontology graph and thereby allows fast score distribution computation. The new algorithm is several orders of magnitude faster than the naive approach, as we demonstrate by computing score distributions for similarity searches in the Human Phenotype Ontology.