Continuously variable duration hidden Markov models for automatic speech recognition
Computer Speech and Language
Hi-index | 0.00 |
Hidden Markov models (HMMs) have been used to model speech in many areas of speech processing. One characteristic of the HMM is that the probability of time spent in a particular state, or state occupancy, is geometrically distributed. This, however, becomes a serious limitation and results in inaccurate modeling when the HMMs are used for phoneme recognition. In this work, we use Hidden semi-Markov Models (HSMM) to overcome the above limitation. Semi-Markov models are a more general class of Markov chains in which the state occupancy can be explicitly modeled by an arbitrary probability mass distribution. We use non-parametric distributions to describe the state occupancies instead of parametric distributions such as Gamma, Poisson or Binomial, as analysis of actual data shows that the duration of some phonemes could not be approximated by any of the above. Preliminary tests conducted using only the LPC cepstrum as features have shown that the use of HSMM increased the phoneme recognition accuracy to 53.7% from 48.4% obtained using an HMM.