Implementation aspects of large vocabulary recognition based on intraword and interword phonetic units

Authors:
R. Pieraccini;C. H. Lee;E. Giachin;L. R. Rabiner
Affiliations:
-;-;-;-
Venue:
HLT '90 Proceedings of the workshop on Speech and Natural Language
Year:
1990

Citing 3
Cited 4

Principles of artificial intelligence

Principles of artificial intelligence
A tree-trellis based fast search for finding the N Best sentence hypotheses in continuous speech recognition

HLT '90 Proceedings of the workshop on Speech and Natural Language
Improved acoustic modeling for continuous speech recognition

HLT '90 Proceedings of the workshop on Speech and Natural Language

Recent progress in robust vocabulary-independent speech recognition

HLT '91 Proceedings of the workshop on Speech and Natural Language
Bayesian learning of Gaussian mixture densities for hidden Markov models

HLT '91 Proceedings of the workshop on Speech and Natural Language
Improved acoustic modeling for continuous speech recognition

HLT '90 Proceedings of the workshop on Speech and Natural Language
Factorization of language constraints in speech recognition

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most large vocabulary speech recognition systems essentially consist of a training algorithm and a recognition structure which is essentially a search for the best path through a rather large decoding network. Although the performance of the recognizer is crucially tied to the details of the training procedure, it is absolutely essential that the recognition structure be efficient in terms of computation and memory, and accurate in terms of actually determining the best path through the lattice, so that a wide range of training (sub-word unit creation) strategies can be efficiently evaluated in a reasonable time period. We have considered an architecture in which we incorporate several well known procedures (beam search, compiled network, etc.) with some new ideas (stacks of active network nodes, likelihood computation on demand, guided search, etc.) to implement a search procedure which maintains the accuracy of the full search but which can decode a single sentence in about one minute of computing time (about 20 times real time) on a vectorized, concurrent processor. The ways in which we have realized this significant computational reduction are described in this paper.