Multiple-sequence functional annotation and the generalized hidden Markov phylogeny

  • Authors:
  • Jon D. McAuliffe;Lior Pachter;Michael I. Jordan

  • Affiliations:
  • Department of Statistics, University of California, 367 Evans Hall, Berkeley, CA 94720, USA,;Department of Mathematics, University of California, 970 Evans Hall, Berkeley, CA 94720, USA;Department of Statistics, University of California, 367 Evans Hall, Berkeley, CA 94720, USA,

  • Venue:
  • Bioinformatics
  • Year:
  • 2004

Quantified Score

Hi-index 3.84

Visualization

Abstract

Motivation: Phylogenetic shadowing is a comparative genomics principle that allows for the discovery of conserved regions in sequences from multiple closely related organisms. We develop a formal probabilistic framework for combining phylogenetic shadowing with feature-based functional annotation methods. The resulting model, a generalized hidden Markov phylogeny (GHMP), applies to a variety of situations where functional regions are to be inferred from evolutionary constraints. Results: We show how GHMPs can be used to predict complete shared gene structures in multiple primate sequences. We also describe shadower, our implementation of such a prediction system. We find that shadower outperforms previously reported ab initio gene finders, including comparative human--mouse approaches, on a small sample of diverse exonic regions. Finally, we report on an empirical analysis of shadower's performance which reveals that as few as five well-chosen species may suffice to attain maximal sensitivity and specificity in exon demarcation. Availability: A Web server is available at http://bonaire.lbl.gov/shadower