Direct maximization of the likelihood of a hidden Markov model

Authors:
Rolf Turner
Affiliations:
Department of Mathematics and Statistics, University of New Brunswick, Fredericton, Canada and Starpath Project, University of Auckland, New Zealand
Venue:
Computational Statistics & Data Analysis
Year:
2008

Citing 4
Cited 1

Elements of statistical computing

Elements of statistical computing
Hidden Markov models for speech recognition

Technometrics
Numerical recipes in FORTRAN (2nd ed.): the art of scientific computing

Numerical recipes in FORTRAN (2nd ed.): the art of scientific computing
Inference in Hidden Markov Models

Inference in Hidden Markov Models

A survey of techniques for incremental learning of HMM parameters

Information Sciences: an International Journal

Quantified Score

Hi-index	0.03

Visualization

Abstract

Ever since the introduction of hidden Markov models by Baum and his co-workers, the method of choice for fitting such models has been maximum likelihood via the EM algorithm. In recent years it has been noticed that the gradient and Hessian of the log likelihood of hidden Markov and related models may be calculated in parallel with a filtering process by which the likelihood may be calculated. Various authors have used, or suggested the use of, this idea in order to maximize the likelihood directly, without using the EM algorithm. In this paper we discuss an implementation of such an approach. We have found that a straightforward implementation of Newton's method sometimes works but is unreliable. A form of the Levenberg-Marquardt algorithm appears to provide excellent reliability. Two rather complex examples are given for applying this algorithm to the fitting of hidden Markov models. In the first a better than 6-fold increase in speed over the EM algorithm was achieved. The second example turned out to be problematic (somewhat interestingly) in that the maximum likelihood estimator appears to be inconsistent. Whatever its merit, this estimator is calculated much faster by Levenberg-Marquardt than by EM. We also compared the Levenberg-Marquardt algorithm, applied to the first example, with a generic numerical maximization procedure. The Levenberg-Marquardt algorithm appeared to perform almost three times better than the generic procedure, even when analytic derivatives were provided, and 19 times better when they were not provided.