Acoustic modelling of subword units in the Isadora speech recognizer

  • Authors:
  • E. G. Schukat-Talamazzini;H. Niemann;W. Eckert;T. Kuhn;S. Rieck

  • Affiliations:
  • Universität Erlangen-Nürnberg;Universität Erlangen-Nürnberg;Universität Erlangen-Nürnberg;Universität Erlangen-Nürnberg;Universität Erlangen-Nürnberg

  • Venue:
  • ICASSP'92 Proceedings of the 1992 IEEE international conference on Acoustics, speech and signal processing - Volume 1
  • Year:
  • 1992

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper addresses the choice of suitable subword units for the HMM-based front-end of a speaker-independent large vocabulary continuous speech dialog system (EVAR [1]). In contrast to the well-known approach of using context-dependent phone-like units (for instance generalized triphones) we developped inventories of larger sized subword units, so-called context-freezing units (CFU). CFU models can be considered as an approximation to the extremely desirable situation of having whole word HMMs under the limiting conditions of the training speech data at hand. Recognition experiments indicate an advantage of the context-freezing units over triphone/biphone/phone combinations in terms of the achieved word accuracy, at least in the case of German speech. Using triphones with contexts generalized by means of broad phonetic classes, we achieved results comparable to the CFU ones.