The most probable annotation problem in HMMs and its application to bioinformatics

Authors:
Broňa Brejová;Daniel G. Brown;Tomáš Vinař
Affiliations:
Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY 14853, USA;David R. Cheriton School of Computer Science, University of Waterloo, 200 University Avenue West, Waterloo, ON, N2L 3G1, Canada;Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY 14853, USA
Venue:
Journal of Computer and System Sciences
Year:
2007

Citing 7
Cited 2

Finding the k Shortest Paths

SIAM Journal on Computing
The consensus string problem and the complexity of comparing hidden Markov models

Journal of Computer and System Sciences - Computational biology 2002
ESTScan: A Program for Detecting, Evaluating, and Reconstructing Potential Coding Regions in EST Sequences

Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
Two Methods for Improving Performance of a HMM and their Application for Gene Finding

Proceedings of the 5th International Conference on Intelligent Systems for Molecular Biology
Computational Complexity of Problems on Probabilistic Grammars and Transducers

ICGI '00 Proceedings of the 5th International Colloquium on Grammatical Inference: Algorithms and Applications
ExonHunter: a comprehensive approach to gene finding

Bioinformatics
Enhancements to hidden markov models for gene finding and other biological applications

Enhancements to hidden markov models for gene finding and other biological applications

The highest expected reward decoding for HMMs with application to recombination detection

CPM'10 Proceedings of the 21st annual conference on Combinatorial pattern matching
Semantics and Ambiguity of Stochastic RNA Family Models

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Hidden Markov models (HMMs) are often used for biological sequence annotation. Each sequence feature is represented by a collection of states with the same label. In annotating a new sequence, we seek the sequence of labels that has highest probability. Computing this most probable annotation was shown NP-hard by Lyngso and Pedersen [R.B. Lyngso, C.N.S. Pedersen, The consensus string problem and the complexity of comparing hidden Markov models, J. Comput. System Sci. 65 (3) (2002) 545-569]. We improve their result by showing that the problem is NP-hard for a specific HMM, and present efficient algorithms to compute the most probable annotation for a large class of HMMs, including abstractions of models previously used for transmembrane protein topology prediction and coding region detection. We also present a small experiment showing that the maximum probability annotation is more accurate than the labeling that results from simpler heuristics.