The consensus string problem and the complexity of comparing hidden Markov models

  • Authors:
  • Rune B. Lyngsø;Christian N. S. Pedersen

  • Affiliations:
  • Department of Computer Science, University of California at Santa Cruz, Santa Cruz, CA;BRICS, Department of Computer Science, University of Aarhus, 8000 Århus C, Denmark

  • Venue:
  • Journal of Computer and System Sciences - Computational biology 2002
  • Year:
  • 2002

Quantified Score

Hi-index 0.01

Visualization

Abstract

The basic theory of hidden Markov models was developed and applied to problems in speech recognition in the late 1960s, and has since then been applied to numerous problems, e.g. biological sequence analysis. Most applications of hidden Markov models are based on efficient algorithms for computing the probability of generating a given string, or computing the most likely path generating a given string. In this paper we consider the problem of computing the most likely string, or consensus string, generated by a given model, and its implications on the complexity of comparing hidden Markov models. We show that computing the consensus string, and approximating its probability within any constant factor, is NP-hard, and that the same holds for the closely related labeling problem for class hidden Markov models. Furthermore, we establish the NP-hardness of comparing two hidden Markov models under the L∞- and L1-norms. We discuss the applicability of the technique used for proving the hardness of comparing two hidden Markov models under the L1-norm to other measures of distance between probability distributions. In particular, we show that it cannot be used for proving NP-hardness of determining the Kullback-Leibler distance between two hidden Markov models, or of comparing them under the Lk-norm for any fixed even integer k.