Acoustic modelling of subword units in the Isadora speech recognizer

Authors:
E. G. Schukat-Talamazzini;H. Niemann;W. Eckert;T. Kuhn;S. Rieck
Affiliations:
Universität Erlangen-Nürnberg;Universität Erlangen-Nürnberg;Universität Erlangen-Nürnberg;Universität Erlangen-Nürnberg;Universität Erlangen-Nürnberg
Venue:
ICASSP'92 Proceedings of the 1992 IEEE international conference on Acoustics, speech and signal processing - Volume 1
Year:
1992

Citing 3
Cited 0

Automatic Speech Recognition: The Development of the Sphinx Recognition System

Automatic Speech Recognition: The Development of the Sphinx Recognition System
Das ISADORA-System - ein akustisch-phonetisches Netzwerk zur automatischen Spracherkennung

Mustererkennung 1991, 13. DAGM-Symposium
An investigation of PLP and IMELDA acoustic representations and of their potential for combination

ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper addresses the choice of suitable subword units for the HMM-based front-end of a speaker-independent large vocabulary continuous speech dialog system (EVAR [1]). In contrast to the well-known approach of using context-dependent phone-like units (for instance generalized triphones) we developped inventories of larger sized subword units, so-called context-freezing units (CFU). CFU models can be considered as an approximation to the extremely desirable situation of having whole word HMMs under the limiting conditions of the training speech data at hand. Recognition experiments indicate an advantage of the context-freezing units over triphone/biphone/phone combinations in terms of the achieved word accuracy, at least in the case of German speech. Using triphones with contexts generalized by means of broad phonetic classes, we achieved results comparable to the CFU ones.