The most probable annotation problem in HMMs and its application to bioinformatics

  • Authors:
  • Broňa Brejová;Daniel G. Brown;Tomáš Vinař

  • Affiliations:
  • Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY 14853, USA;David R. Cheriton School of Computer Science, University of Waterloo, 200 University Avenue West, Waterloo, ON, N2L 3G1, Canada;Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY 14853, USA

  • Venue:
  • Journal of Computer and System Sciences
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Hidden Markov models (HMMs) are often used for biological sequence annotation. Each sequence feature is represented by a collection of states with the same label. In annotating a new sequence, we seek the sequence of labels that has highest probability. Computing this most probable annotation was shown NP-hard by Lyngso and Pedersen [R.B. Lyngso, C.N.S. Pedersen, The consensus string problem and the complexity of comparing hidden Markov models, J. Comput. System Sci. 65 (3) (2002) 545-569]. We improve their result by showing that the problem is NP-hard for a specific HMM, and present efficient algorithms to compute the most probable annotation for a large class of HMMs, including abstractions of models previously used for transmembrane protein topology prediction and coding region detection. We also present a small experiment showing that the maximum probability annotation is more accurate than the labeling that results from simpler heuristics.