A study of speech recognition for children and the elderly

Authors:
J. G. Wilpon;C. N. Jacobsen
Affiliations:
AT&TBell Labs., Murray Hill, NJ, USA;AT&TBell Labs., Murray Hill, NJ, USA
Venue:
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Year:
1996

Citing 0
Cited 19

Designing robust multimodal systems for universal access

WUAUC'01 Proceedings of the 2001 EC/NSF workshop on Universal accessibility of ubiquitous computing: providing for the elderly
Feature vs. Model Based Vocal Tract Length Normalization for a Speech Recognition-Based Interactive Toy

AMT '01 Proceedings of the 6th International Computer Science Conference on Active Media Technology
Toward adaptive conversational interfaces: Modeling speech convergence with animated personas

ACM Transactions on Computer-Human Interaction (TOCHI)
Audio-visual cues distinguishing self- from system-directed speech in younger and older adults

ICMI '05 Proceedings of the 7th international conference on Multimodal interfaces
Automatic speech recognition and speech variability: A review

Speech Communication
Acoustic variability and automatic recognition of children's speech

Speech Communication
Acceptance of speech recognition by physicians: A survey of expectations, experiences, and social influence

International Journal of Human-Computer Studies
Towards age-independent acoustic modeling

Speech Communication
Speech Input from Older Users in Smart Environments: Challenges and Perspectives

UAHCI '09 Proceedings of the 5th International on ConferenceUniversal Access in Human-Computer Interaction. Part II: Intelligent and Ubiquitous Interaction Environments
Automatic speech recognition systems for the evaluation of voice and speech disorders in head and neck cancer

EURASIP Journal on Audio, Speech, and Music Processing - Special issue on atypical speech
A review of ASR technologies for children's speech

Proceedings of the 2nd Workshop on Child, Computer and Interaction
Improved automatic speech recognition through speaker normalization

Computer Speech and Language
Ambient intelligence and multimodality

UAHCI'07 Proceedings of the 4th international conference on Universal access in human-computer interaction: ambient interaction
Ageing voices: the effect of changes in voice parameters on ASR performance

EURASIP Journal on Audio, Speech, and Music Processing - Special issue on atypical speech
On the impact of children's emotional speech on acoustic and language models

EURASIP Journal on Audio, Speech, and Music Processing - Special issue on atypical speech
Exploring the effect of differences in the acoustic correlates of adults' and children's speech in the context of automatic speech recognition

EURASIP Journal on Audio, Speech, and Music Processing - Special issue on atypical speech
Age and gender detection in the I-DASH project

ACM Transactions on Speech and Language Processing (TSLP)
Aging speech recognition with speaker adaptation techniques: Study on medium vocabulary continuous Bengali speech

Pattern Recognition Letters
The CARES corpus: a database of older adult actor simulated emergency dialogue for developing a personal emergency response system

International Journal of Speech Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Although children and the elderly have obvious needs for voice operated interfaces, hardly anything is known about the performance of the current automatic speech recognition technology with these people. In this paper we report the results of a thorough investigation into this field using a connected digit recognizer and a major telephone speech database. One would generally assume that the recognition of speech from these people would only be a matter of having enough, sufficiently representative training data. This turns out to be true only, as long as the speakers belong to the age range 15 to approximately 70. Outside this range the error rates increase dramatically, even with balanced amounts of training data. For males, the lower limit is very sharp and can be attributed to the change of pitch frequency during puberty. For females, the lower limit is gradual and caused by the slowly changing dimensions of the vocal tract length only. For both genders, the upper limit is very gradual and can possibly be attributed to changes in the glottis area and the internal control loops of the human articulatory system. The paper presents some supporting evidence for the above assertions and gives results for various attempts to improve the performance. Recognition of children and the elderly will require much more research if we are to fully understand the characteristics of these age group on current and future speech recognition systems.