Techniques to achieve an accurate real-time large-vocabulary speech recognition system

Authors:
Hy Murveit;Peter Monaco;Vassilios Digalakis;John Butzberger
Affiliations:
SRI International, Menlo Park, California;SRI International, Menlo Park, California;SRI International, Menlo Park, California;SRI International, Menlo Park, California
Venue:
HLT '94 Proceedings of the workshop on Human Language Technology
Year:
1994

Citing 1
Cited 3

Search algorithms for software-only real-time recognition with very large vocabularies

HLT '93 Proceedings of the workshop on Human Language Technology

High-accuracy large-vocabulary speech recognition using mixture tying and consistency modeling

HLT '94 Proceedings of the workshop on Human Language Technology
Probabilistic scoring using decision trees for fast and scalable speaker recognition

Speech Communication
The latent words language model

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

In addressing the problem of achieving high-accuracy real-time speech recognition systems, we focus on recognizing speech from ARPA's 20,000-word Wall Street Journal (WSJ) task, using current UNIX workstations. We have found that our standard approach---using a narrow beam width in a Viterbi search for simple discrete-density hidden Markov models (HMMs)---works in real time with only very low accuracy. Our most accurate algorithms recognize speech many times slower than real time. Our (yet unattained) goal is to recognize speech in real time at or near full accuracy.We describe the speed/accuracy trade-offs associated with several techniques used in a one-pass speech recognition framework:• Trade-offs associated with reducing the acoustic modeling resolution of the HMMs (e.g., output-distribution type, number of parameters, cross-word modeling)• Trade-offs associated with using lexicon trees, and techniques for implementing full and partial bigram grammars with those trees• Computation of Gaussian probabilities are the most time-consuming aspect of our highest accuracy system, and techniques allowing us to reduce the number of Gaussian probabilities computed with little or no impact on speech recognition accuracy.Our results show that tree-based modeling techniques used with appropriate acoustic modeling approaches achieve real-time performance on current UNIX workstations at about a 30% error rate for the WSJ task. The results also show that we can dramatically reduce the computational complexity of our more accurate but slower modeling alternatives so that they are near the speed necessary for real-time performance in a multipass search. Our near-future goal is to combine these two technologies so that real-time, high-accuracy large-vocabulary speech recognition can be achieved.