Learning Prosodic Patterns for Mandarin Speech Synthesis

Authors:
Yiqiang Chen;Wen Gao;Tingshao Zhu;Charles Ling
Affiliations:
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, People's Republic of China 100080. yqchen@ict.ac.cn;Institute of Computing Technology, Chinese Academy of Sciences, Beijing, People's Republic of China 100080. wgao@ict.ac.cn;Department of Computing Science, University of Alberta, Edmonton, Canada T6G 2E1. tszhu@cs.ualberta.ca;Department of Computer Science, University of West Ontario, London, Ontario, Canada N6A 5B7. ling@csd.uwo.ca
Venue:
Journal of Intelligent Information Systems
Year:
2002

Citing 6
Cited 1

C4.5: programs for machine learning

C4.5: programs for machine learning
Fundamentals of speech recognition

Fundamentals of speech recognition
Rough classification

International Journal of Human-Computer Studies - Special issue: 1969-1999, the 30th anniversary
Tree-based modeling of prosodic phrasing and segmental duration for Korean TTS systems

Speech Communication
Template-driven generation of prosodic information for Chinese concatenative synthesis

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
Local learning in probabilistic networks with hidden variables

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2

Tone correctness improvement in speaker-independent average-voice-based Thai speech synthesis

Speech Communication

Quantified Score

Hi-index	0.01

Visualization

Abstract

Higher quality synthesized speech is required for widespread use of text-to-speech (TTS) technology, and the prosodic pattern is the key feature that makes synthetic speech sound unnatural and monotonous, which mainly describes the variation of pitch. The rules used in most Chinese TTS systems are constructed by experts, with weak quality control and low precision. In this paper, we propose a combination of clustering and machine learning techniques to extract prosodic patterns from actual large mandarin speech databases to improve the naturalness and intelligibility of synthesized speech. Typical prosody models are found by clustering analysis. Some machine learning techniques, including Rough Set, Artificial Neural Network (ANN) and Decision tree, are trained for fundamental frequency and energy contours, which can be directly used in a pitch-synchronous-overlap-add-based (PSOLA-based) TTS system. The experimental results showed that synthesized prosodic features greatly resembled their original counterparts for most syllables.