Multiple approaches to robust speech recognition

Authors:
Richard M. Stern;Fu-Hua Liu;Yoshiaki Ohshima;Thomas M. Sullivan;Alejandro Acero
Affiliations:
Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA
Venue:
HLT '91 Proceedings of the workshop on Speech and Natural Language
Year:
1992

Citing 2
Cited 6

Adaptive signal processing

Adaptive signal processing
Speech understanding in open tasks

HLT '91 Proceedings of the workshop on Speech and Natural Language

DARPA February 1992 ATIS benchmark test results

HLT '91 Proceedings of the workshop on Speech and Natural Language
Speech understanding in open tasks

HLT '91 Proceedings of the workshop on Speech and Natural Language
An HMM-based method for Thai spelling speech recognition

Computers & Mathematics with Applications
Speech encoding in a model of peripheral auditory processing: Quantitative assessment by means of automatic speech recognition

Speech Communication
Thai spelling analysis for automatic spelling speech recognition

Information Sciences: an International Journal
Influence of background noise and microphone on the performance of the IBM Tangora speech recognition system

ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper compares several different approaches to robust speech recognition. We review CMU's ongoing research in the use of acoustical pre-processing to achieve robust speech recognition, and we present the results of the first evaluation of preprocessing in the context of the DARPA standard ATIS domain for spoken language systems. We also describe and compare the effectiveness of three complementary methods of signal processing for robust speech recognition: acoustical pre-processing, microphone array processing, and the use of physiologically-motivated models of peripheral signal processing. Recognition error rates are presented using these three approaches in isolation and in combination with each other for the speaker-independent continuous alphanumeric census speech recognition task.