Towards robustness to fast speech in ASR

Authors:
N. Mirghafori;E. Fosler;N. Morgan
Affiliations:
Dept. of Electr. Eng. & Comput. Sci., California Univ., Berkeley, CA, USA;Dept. of Electr. Eng. & Comput. Sci., California Univ., Berkeley, CA, USA;Dept. of Electr. Eng. & Comput. Sci., California Univ., Berkeley, CA, USA
Venue:
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Year:
1996

Citing 0
Cited 6

Toward adaptive conversational interfaces: Modeling speech convergence with animated personas

ACM Transactions on Computer-Human Interaction (TOCHI)
Automatic speech recognition and speech variability: A review

Speech Communication
Acoustic variability and automatic recognition of children's speech

Speech Communication
Towards age-independent acoustic modeling

Speech Communication
Audio hot spotting and retrieval using multiple features

SpeechIR '04 Proceedings of the Workshop on Interdisciplinary Approaches to Speech Indexing and Retrieval at HLT-NAACL 2004
Exploring the effect of differences in the acoustic correlates of adults' and children's speech in the context of automatic speech recognition

EURASIP Journal on Audio, Speech, and Music Processing - Special issue on atypical speech

Quantified Score

Hi-index	0.00

Visualization

Abstract

Psychoacoustic studies show that human listeners are sensitive to speaking rate variations. Automatic speech recognition (ASR) systems are even more affected by the changes in rate, as double to quadruple word recognition error rates of average speakers have been observed for fast speakers on many ASR systems. In our earlier work (see Proceedings of EUROSPEECH95, p.491-4, 1995), we studied the causes of higher error and concluded that both the acoustic-phonetic and the phonological differences are sources of higher word error rates. In this work, we have studied various measures for quantifying rate of speech (ROS) and used simple methods for estimating the speaking rate of a novel utterance using ASR technology. We have also implemented mechanisms that make our ASR system more robust to fast speech. Using our ROS estimator to identify fast sentences in the test set, our rate-dependent system has 24.5% fewer errors on the fastest sentences and 6.2% fewer errors on all sentences of the WSJ93 evaluation set relative to the baseline HMM/MLP system.