Exploring the effect of differences in the acoustic correlates of adults' and children's speech in the context of automatic speech recognition

Authors:
Shweta Ghai;Rohit Sinha
Affiliations:
Department of Electronics and Communication Engineering, Indian Institute of Technology Guwahati, Guwahati, India;Department of Electronics and Communication Engineering, Indian Institute of Technology Guwahati, Guwahati, India
Venue:
EURASIP Journal on Audio, Speech, and Music Processing - Special issue on atypical speech
Year:
2010

Citing 7
Cited 0

Towards robustness to fast speech in ASR

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
A study of speech recognition for children and the elderly

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Speaker normalization using efficient frequency warping procedures

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Automatic speech recognition and speech variability: A review

Speech Communication
Acoustic variability and automatic recognition of children's speech

Speech Communication
Highly accurate children's speech recognition for interactive reading tutors using subword units

Speech Communication
Improved automatic speech recognition through speaker normalization

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

This work explores the effect of mismatches between adults' and children's speech due to differences in various acoustic correlates on the automatic speech recognition performance under mismatched conditions. The different correlates studied in this work include the pitch, the speaking rate, the glottal parameters (open quotient, return quotient, and speech quotient), and the formant frequencies. An effort is made to quantify the effect of these correlates by explicitly normalizing each of them using the already existing techniques available in literature. Our initial study done on a connected digit recognition task shows that among these parameters only the formant frequencies, the pitch, and the speaking rate affect the automatic speech recognition performance. Significant improvements are obtained in the performance with normalization of these three parameters. With combined normalization of the pitch, the speaking rate, and the formant frequencies, 80% and 70% relative improvements are obtained over the baseline for children's speech and adults' speech recognition under mismatched conditions.