Is text-to-speech synthesis ready for use in computer-assisted language learning?

Authors:
Zöe Handley
Affiliations:
School of Computer Science, The University of Manchester, Lamb Building, Booth St. East, Manchester, M13 9EP, UK1The research was actually carried out in The School of Informatics, which has since ...
Venue:
Speech Communication
Year:
2009

Citing 8
Cited 2

The use of voice synthesizer in the discovery of the written language by young children

Computers & Education - Special issue on exploring the nature of research in computer-related applications in education
An introduction to text-to-speech synthesis

An introduction to text-to-speech synthesis
Computer Speech Technology

Computer Speech Technology
Evaluating Natural Language Processing Systems: An Analysis and Review

Evaluating Natural Language Processing Systems: An Analysis and Review
Spoken Language Processing: A Guide to Theory, Algorithm, and System Development

Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
Innovative language learning: achieving the vision

ReCALL
CHATR: a generic speech synthesis system

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
Editorial Special Section on Expressive Speech Synthesis

IEEE Transactions on Audio, Speech, and Language Processing

Usability assessment of text-to-speech synthesis for additional detail in an automated telephone banking system

Computer Speech and Language
On the effectiveness of robot-assisted language learning

ReCALL

Quantified Score

Hi-index	0.00

Visualization

Abstract

Text-to-speech (TTS) synthesis, the generation of speech from text input, offers another means of providing spoken language input to learners in Computer-Assisted Language Learning (CALL) environments. Indeed, many potential benefits (ease of creation and editing of speech models, generation of speech models and feedback on demand, etc.) and uses (talking dictionaries, talking texts, dictation, pronunciation training, dialogue partner, etc.) of TTS synthesis in CALL have been put forward. Yet, the use of TTS synthesis in CALL is not widely accepted and only a few applications have found their way onto the market. One potential reason for this is that TTS synthesis has not been adequately evaluated for this purpose. Previous evaluations of TTS synthesis for use in CALL, have only addressed the comprehensibility of TTS synthesis. Yet, CALL places demands on the comprehensibility, naturalness, accuracy, register and expressiveness of the output of TTS synthesis. In this paper, the aforementioned aspects of the quality of the output of four state-of-the-art French TTS synthesis systems are evaluated with respect to their use in the three different roles that TTS synthesis systems may assume within CALL applications, namely: (1) reading machine, (2) pronunciation model and (3) conversational partner [Handley, Z., Hamel, M.-J., 2005. Establishing a methodology for benchmarking speech synthesis for computer-assisted language learning (CALL). Language Learning and Technology Journal 9(3), 99-119. Retrieved from: http://llt.msu.edu/vol9num3/handley/default.html.]. The results of this evaluation suggest that the best TTS synthesis systems are ready for use in applications in which they 'add value' to CALL, i.e. exploit the unique capacity of TTS synthesis to generate speech models on demand. An example of such an application is a dialogue partner. In order to fully meet the requirements of CALL, further attention needs to be paid to accuracy and naturalness, in particular at the prosodic level, and expressiveness.