The effect of task conditions on the comprehensibility of synthetic speech

Authors:
Jennifer Lai;David Wood;Michael Considine
Affiliations:
IBM Corporation/T.J. Watson Research Center, 30 Saw Mill River Road, Hawthorne, New York;IBM Corporation/T.J. Watson Research Center, 30 Saw Mill River Road, Hawthorne, New York;Rice University, 6300 S. Main Street, Houston, Texas
Venue:
Proceedings of the SIGCHI conference on Human Factors in Computing Systems
Year:
2000

Citing 1
Cited 8

Comprehension of synthetic speech produced by rule: word monitoring and sentence-by-sentence listening times

Human Factors

Shall we mix synthetic speech and human speech?: impact on users' performance, perception, and attitude

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
On the road and on the Web?: comprehension of synthetic and human speech while driving

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Conversational speech interfaces

The human-computer interaction handbook
Beyond standards: reaching usability goals through user participation

ACM SIGACCESS Accessibility and Computing
Accessibility heuristics utilizing learnability characteristics of synthesized speech applications

ACM SIGACCESS Accessibility and Computing
ECA as user interface paradigm

From brows to trust
Reading on-the-go: a comparison of audio and hand-held displays

Proceedings of the 8th conference on Human-computer interaction with mobile devices and services
Evaluation of TTS systems in intelligibility and comprehension tasks

ROCLING '11 Proceedings of the 23rd Conference on Computational Linguistics and Speech Processing

Quantified Score

Hi-index	0.01

Visualization

Abstract

A study was conducted with 78 subjects to evaluate the comprehensibility of synthetic speech for various tasks ranging from short, simple e-mail messages to longer news articles on mostly obscure topics. Comprehension accuracy for each subject was measured for synthetic speech and for recorded human speech. Half the subjects were allowed to take notes while listening, the other half were not. Findings show that there was no significant difference in comprehension of synthetic speech among the five different text-to-speech engines used. Those subjects that did not take notes performed significantly worse for all synthetic voice tasks when compared to recorded speech tasks. Performance for synthetic speech in the non note-taking condition degraded as the task got longer and more complex. When taking notes, subjects also did significantly worse within the synthetic voice condition averaged across all six tasks. However, average performance scores for the last three tasks in this condition show comparable results for human and synthetic speech, reflective of a training effect.