A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis
IEICE - Transactions on Information and Systems
Enrich web applications with voice internet persona text-to-speech for anyone, anywhere
HCI'07 Proceedings of the 12th international conference on Human-computer interaction: intelligent multimodal interaction environments
Non-uniform unit selection in Vietnamese speech synthesis
Proceedings of the Second Symposium on Information and Communication Technology
Selecting prosody parameters for unit selection based chinese TTS
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
An HMM-based mandarin chinese text-to-speech system
ISCSLP'06 Proceedings of the 5th international conference on Chinese Spoken Language Processing
The paradigm for creating multi-lingual text-to-speech voice databases
ISCSLP'06 Proceedings of the 5th international conference on Chinese Spoken Language Processing
Hi-index | 0.00 |
This paper proposes a two-module text to speech system (TTS) structure, which bypasses the prosody model that predicts numerical prosodic parameters for synthetic speech. Instead, many instances of each basic unit from a large speech corpus are classified into categories by a classification and regression tree (CART), in which the expectation of the weighted sum of square regression error of prosodic features is used as splitting criterion. Better prosody is achieved by keeping slender diversity in prosodic features of instances belonging to the same class. A multi-tier non-uniform unit selection method is presented. It makes the best decision on unit selection by minimizing the concatenated cost of a whole utterance. Since the largest available and suitable units are selected for concatenating, distortion caused by mismatches at concatenated points is minimized. Very natural and fluent speech is synthesized, according to informal listening test.