Speech Synthesis for Error Training Models in CALL

Authors:
Xin Zhang;Qin Lu;Jiping Wan;Guangguang Ma;Tin Shing Chiu;Weiping Ye;Wenli Zhou;Qiao Li
Affiliations:
Department of Electronics, Beijing Normal University, China;Department of Computing, Hong Kong Polytechnic University, China;Department of Electronics, Beijing Normal University, China;Department of Electronics, Beijing Normal University, China;Department of Computing, Hong Kong Polytechnic University, China;Department of Electronics, Beijing Normal University, China;Department of Electronics, Beijing Normal University, China;Department of Electronics, Beijing Normal University, China
Venue:
ICCPOL '09 Proceedings of the 22nd International Conference on Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy
Year:
2009

Citing 1
Cited 0

Fundamentals of speech recognition

Fundamentals of speech recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

A computer assisted pronunciation teaching system (CAPT) is a fundamental component in a computer assisted language learning system (CALL). A speech recognition based CAPT system often requires a large amount of speech data to train the incorrect phone models in its speech recognizer. But collecting incorrectly pronounced speech data is a labor intensive and costly work. This paper reports an effort on training the incorrect phone models by making use of synthesized speech data. A special formant speech synthesizer is designed to filter the correctly pronounced phones into incorrect phones by modifying the formant frequencies. In a Chinese Putonghua CALL system for native Cantonese speakers to learn Mandarin, a small experimental CAPT system is built with a synthetic speech data trained recognizer. Evaluation shows that a CAPT system using synthesized data can perform as good as or even better than that using real data provided that the size of the synthetic data are large enough.