Recent progress in robust vocabulary-independent speech recognition

Authors:
Hsiao-Wuen Hon;Kai-Fu Lee
Affiliations:
-;-
Venue:
HLT '91 Proceedings of the workshop on Speech and Natural Language
Year:
1991

Citing 2
Cited 2

Implementation aspects of large vocabulary recognition based on intraword and interword phonetic units

HLT '90 Proceedings of the workshop on Speech and Natural Language
Improved hidden Markov modeling for speaker-independent continuous speech recognition

HLT '90 Proceedings of the workshop on Speech and Natural Language

Vocabulary and environment adaptation in vocabulary-independent speech recognition

HLT '91 Proceedings of the workshop on Speech and Natural Language
Context modeling with the stochastic segment model

ICASSP'92 Proceedings of the 1992 IEEE international conference on Acoustics, speech and signal processing - Volume 1

Quantified Score

Hi-index	0.02

Visualization

Abstract

This paper reports recent efforts to improve the performance of CMU's robust vocabulary-independent (VI) speech recognition systems on the DARPA speaker-independent resource management task. The improvements are evaluated on 320 sentences that randomly selected from the DARPA June 88, February 89 and October 89 test sets. Our first improvement involves more detailed acoustic modeling. We incorporated more dynamic features computed from the LPC cepstra and reduced error by 15% over the baseline system. Our second improvement comes from a larger training database. With more training data, our third improvement comes from a more detailed subword modeling. We incorporated the word boundary context into our VI subword modeling and it resulted in a 30% error reduction. Finally, we used decision-tree allophone clustering to find more suitable models for the subword units not covered in the training set and further reduced error by 17%. All the techniques combined reduced the VI error rate on the resource management task from 11.1% to 5.4% (and from 15.4% to 7.4% when training and testing were under different recording environment). This vocabulary-independent performance has exceeded our vocabulary-dependent performance.