Explicit duration modelling in HMM/ANN hybrids

  • Authors:
  • László Tóth;András Kocsor

  • Affiliations:
  • Research Group on Artificial Intelligence, Szeged, Hungary;Research Group on Artificial Intelligence, Szeged, Hungary

  • Venue:
  • TSD'05 Proceedings of the 8th international conference on Text, Speech and Dialogue
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

In some languages like Finnish or Hungarian phone duration is a very important distinctive acoustic cue. The conventional HMM speech recognition framework, however, is known to poorly model the duration information. In this paper we compare different duration models within the framework of HMM/ANN hybrids. The tests are performed with two different hybrid models, the conventional one and the “averaging hybrid” recently proposed. Independent of the model configuration, we report that the usual exponential duration model has no detectable advantage over using no duration model at all. Similarly, applying the same fixed value for all state transition probabilities, as is usual with HMM/ANN systems, is found to have no influence on the performance. However, the practical trick of imposing a minimum duration on the phones turns out to be very useful. The key part of the paper is the introduction of the gamma distribution duration model, which proves clearly superior to the exponential one, yielding a 12-20% relative improvement in the word error rate, thus justifying the use of sophisticated duration models in speech recognition.