On-line viterbi algorithm for analysis of long biological sequences

Authors:
Rastislav Šrámek;Broňa Brejová;Tomáš Vinař
Affiliations:
Department of Computer Science, Comenius University, Bratislava, Slovakia;Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY;Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY
Venue:
WABI'07 Proceedings of the 7th international conference on Algorithms in Bioinformatics
Year:
2007

Citing 3
Cited 0

Designing seeds for similarity search in genomic DNA

Journal of Computer and System Sciences - Special issue on bioinformatics II
ExonHunter: a comprehensive approach to gene finding

Bioinformatics
The Treeterbi and Parallel Treeterbi algorithms: efficient, optimal decoding for ordinary, generalized and pair HMMs

Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Hidden Markov models (HMMs) are routinely used for analysis of long genomic sequences to identify various features such as genes, CpG islands, and conserved elements. A commonly used Viterbi algorithm requires O(mn) memory to annotate a sequence of length n with an m-state HMM, which is impractical for analyzing whole chromosomes. In this paper, we introduce the on-line Viterbi algorithm for decoding HMMs in much smaller space. Our analysis shows that our algorithm has the expected maximum memory Θ(mlog n) on two-state HMMs. We also experimentally demonstrate that our algorithm significantly reduces memory of decoding a simple HMM for gene finding on both simulated and real DNA sequences, without a significant slow-down compared to the classical Viterbi algorithm.