Comparing humans and automatic speech recognition systems in recognizing dysarthric speech

Authors:
Kinfe Tadesse Mengistu;Frank Rudzicz
Affiliations:
University of Toronto, Department of Computer Science, Toronto, Ontario, Canada;University of Toronto, Department of Computer Science, Toronto, Ontario, Canada
Venue:
Canadian AI'11 Proceedings of the 24th Canadian conference on Advances in artificial intelligence
Year:
2011

Citing 2
Cited 0

Speech recognition by machines and humans

Speech Communication
The TORGO database of acoustic and articulatory speech from speakers with dysarthria

Language Resources and Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Speech is a complex process that requires control and coordination of articulation, breathing, voicing, and prosody. Dysarthria is a manifestation of an inability to control and coordinate one or more of these aspects, which results in poorly articulated and hardly intelligible speech. Hence individuals with dysarthria are rarely understood by human listeners. In this paper, we compare and evaluate how well dysarthric speech can be recognized by an automatic speech recognition system (ASR) and naïve adult human listeners. The results show that despite the encouraging performance of ASR systems, and contrary to the claims in other studies, on average human listeners perform better in recognizing single-word dysarthric speech. In particular, the mean word recognition accuracy of speaker-adapted monophone ASR systems on stimuli produced by six dysarthric speakers is 68.39% while the mean percentage correct response of 14 naïve human listeners on the same speech is 79.78% as evaluated using single-word multiple-choice intelligibility test.