Explicit duration modelling in HMM/ANN hybrids

Authors:
László Tóth;András Kocsor
Affiliations:
Research Group on Artificial Intelligence, Szeged, Hungary;Research Group on Artificial Intelligence, Szeged, Hungary
Venue:
TSD'05 Proceedings of the 8th international conference on Text, Speech and Dialogue
Year:
2005

Citing 2
Cited 1

Towards increasing speech recognition error rates

Speech Communication
Connectionist Speech Recognition: A Hybrid Approach

Connectionist Speech Recognition: A Hybrid Approach

A segment-based interpretation of HMM/ANN hybrids

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

In some languages like Finnish or Hungarian phone duration is a very important distinctive acoustic cue. The conventional HMM speech recognition framework, however, is known to poorly model the duration information. In this paper we compare different duration models within the framework of HMM/ANN hybrids. The tests are performed with two different hybrid models, the conventional one and the “averaging hybrid” recently proposed. Independent of the model configuration, we report that the usual exponential duration model has no detectable advantage over using no duration model at all. Similarly, applying the same fixed value for all state transition probabilities, as is usual with HMM/ANN systems, is found to have no influence on the performance. However, the practical trick of imposing a minimum duration on the phones turns out to be very useful. The key part of the paper is the introduction of the gamma distribution duration model, which proves clearly superior to the exponential one, yielding a 12-20% relative improvement in the word error rate, thus justifying the use of sophisticated duration models in speech recognition.