Selecting hidden Markov model state number with cross-validated likelihood

Authors:
Gilles Celeux;Jean-Baptiste Durand
Affiliations:
Département de Mathématiques, INRIA Futurs, Orsay, Université Paris-Sud, Orsay Cedex, France 91405;Laboratoire Jean Kuntzmann, INRIA Rhône-Alpes, Grenoble Universités, Grenoble Cedex 9, France 38 041
Venue:
Computational Statistics
Year:
2008

Citing 0
Cited 5

Semi-supervised learning with data calibration for long-term time series forecasting

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Sticky hidden Markov modeling of comparative genomic hybridization

IEEE Transactions on Signal Processing
Segmental K-means learning with mixture distribution for HMM based handwriting recognition

PReMI'11 Proceedings of the 4th international conference on Pattern recognition and machine intelligence
Unsupervised parameter selection for gesture recognition with vector quantization and Hidden Markov models

INTERACT'11 Proceedings of the 13th IFIP TC 13 international conference on Human-computer interaction - Volume Part IV
Identifying anomalous signals in GPS data using HMMs: An increased likelihood of earthquakes?

Computational Statistics & Data Analysis

Quantified Score

Hi-index	0.01

Visualization

Abstract

The problem of estimating the number of hidden states in a hidden Markov model is considered. Emphasis is placed on cross-validated likelihood criteria. Using cross-validation to assess the number of hidden states allows to circumvent the well-documented technical difficulties of the order identification problem in mixture models. Moreover, in a predictive perspective, it does not require that the sampling distribution belongs to one of the models in competition. However, computing cross-validated likelihood for hidden Markov models for which only one training sample is available, involves difficulties since the data are not independent. Two approaches are proposed to compute cross-validated likelihood for a hidden Markov model. The first one consists of using a deterministic half-sampling procedure, and the second one consists of an adaptation of the EM algorithm for hidden Markov models, to take into account randomly missing values induced by cross-validation. Numerical experiments on both simulated and real data sets compare different versions of cross-validated likelihood criterion and penalised likelihood criteria, including BIC and a penalised marginal likelihood criterion. Those numerical experiments highlight a promising behaviour of the deterministic half-sampling criterion.