Selecting hidden Markov model state number with cross-validated likelihood

  • Authors:
  • Gilles Celeux;Jean-Baptiste Durand

  • Affiliations:
  • Département de Mathématiques, INRIA Futurs, Orsay, Université Paris-Sud, Orsay Cedex, France 91405;Laboratoire Jean Kuntzmann, INRIA Rhône-Alpes, Grenoble Universités, Grenoble Cedex 9, France 38 041

  • Venue:
  • Computational Statistics
  • Year:
  • 2008

Quantified Score

Hi-index 0.01

Visualization

Abstract

The problem of estimating the number of hidden states in a hidden Markov model is considered. Emphasis is placed on cross-validated likelihood criteria. Using cross-validation to assess the number of hidden states allows to circumvent the well-documented technical difficulties of the order identification problem in mixture models. Moreover, in a predictive perspective, it does not require that the sampling distribution belongs to one of the models in competition. However, computing cross-validated likelihood for hidden Markov models for which only one training sample is available, involves difficulties since the data are not independent. Two approaches are proposed to compute cross-validated likelihood for a hidden Markov model. The first one consists of using a deterministic half-sampling procedure, and the second one consists of an adaptation of the EM algorithm for hidden Markov models, to take into account randomly missing values induced by cross-validation. Numerical experiments on both simulated and real data sets compare different versions of cross-validated likelihood criterion and penalised likelihood criteria, including BIC and a penalised marginal likelihood criterion. Those numerical experiments highlight a promising behaviour of the deterministic half-sampling criterion.