On-line viterbi algorithm for analysis of long biological sequences

  • Authors:
  • Rastislav Šrámek;Broňa Brejová;Tomáš Vinař

  • Affiliations:
  • Department of Computer Science, Comenius University, Bratislava, Slovakia;Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY;Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY

  • Venue:
  • WABI'07 Proceedings of the 7th international conference on Algorithms in Bioinformatics
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Hidden Markov models (HMMs) are routinely used for analysis of long genomic sequences to identify various features such as genes, CpG islands, and conserved elements. A commonly used Viterbi algorithm requires O(mn) memory to annotate a sequence of length n with an m-state HMM, which is impractical for analyzing whole chromosomes. In this paper, we introduce the on-line Viterbi algorithm for decoding HMMs in much smaller space. Our analysis shows that our algorithm has the expected maximum memory Θ(mlog n) on two-state HMMs. We also experimentally demonstrate that our algorithm significantly reduces memory of decoding a simple HMM for gene finding on both simulated and real DNA sequences, without a significant slow-down compared to the classical Viterbi algorithm.