Use of semi-Markov models for speaker-independent phoneme recognition

  • Authors:
  • Nimal Ratnayake;Michael Savic;Jeffrey Sorensen

  • Affiliations:
  • Electrical, Computer, and Systems Engineering Dept., Rensselaer Polytechnic Institute, Troy, NY;Electrical, Computer, and Systems Engineering Dept., Rensselaer Polytechnic Institute, Troy, NY;Electrical, Computer, and Systems Engineering Dept., Rensselaer Polytechnic Institute, Troy, NY

  • Venue:
  • ICASSP'92 Proceedings of the 1992 IEEE international conference on Acoustics, speech and signal processing - Volume 1
  • Year:
  • 1992

Quantified Score

Hi-index 0.00

Visualization

Abstract

Hidden Markov models (HMMs) have been used to model speech in many areas of speech processing. One characteristic of the HMM is that the probability of time spent in a particular state, or state occupancy, is geometrically distributed. This, however, becomes a serious limitation and results in inaccurate modeling when the HMMs are used for phoneme recognition. In this work, we use Hidden semi-Markov Models (HSMM) to overcome the above limitation. Semi-Markov models are a more general class of Markov chains in which the state occupancy can be explicitly modeled by an arbitrary probability mass distribution. We use non-parametric distributions to describe the state occupancies instead of parametric distributions such as Gamma, Poisson or Binomial, as analysis of actual data shows that the duration of some phonemes could not be approximated by any of the above. Preliminary tests conducted using only the LPC cepstrum as features have shown that the use of HSMM increased the phoneme recognition accuracy to 53.7% from 48.4% obtained using an HMM.