A consistency analysis on an acoustic module for Mandarin text-to-speech

Authors:
Cheng-Yu Yeh;Shun-Chieh Chang;Shaw-Hwa Hwang
Affiliations:
Department of Electrical Engineering, National Chin-Yi University of Technology, 57, Sec. 2, Zhongshan Rd., Taiping Dist., Taichung 41170, Taiwan, ROC;Department of Electrical Engineering, National Taipei University of Technology, 1, Sec. 3, Chung-hsiao E. Rd., Taipei 10608, Taiwan, ROC;Department of Electrical Engineering, National Taipei University of Technology, 1, Sec. 3, Chung-hsiao E. Rd., Taipei 10608, Taiwan, ROC
Venue:
Speech Communication
Year:
2013

Citing 10
Cited 0

Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones

Speech Communication
Text-to-Speech Conversion Technology

Computer
A Chinese Text-to-Speech System Based on Part-of-Speech Analysis, Prosodic Modeling and Non-Uniform Units

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
An RNN-based algorithm to detect prosodic phrase for Chinese TTS

ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 200. on IEEE International Conference - Volume 02
Architectural Optimizations for Text to Speech Synthesis in Embedded Systems

ASP-DAC '07 Proceedings of the 2007 Asia and South Pacific Design Automation Conference
Review: Statistical parametric speech synthesis

Speech Communication
A dynamic cost weighting framework for unit selection text-to-speech synthesis

IEEE Transactions on Audio, Speech, and Language Processing
Smartphone-Based Vehicle-to-Driver/Environment Interaction System for Motorcycles

IEEE Embedded Systems Letters
Embedded unit selection text-to-speech synthesis for mobile devices

IEEE Transactions on Consumer Electronics
A unit selection text-to-speech synthesis system optimized for use with screen readers

IEEE Transactions on Consumer Electronics

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this work, a consistency analysis on an acoustic module for a Mandarin text-to-speech (TTS) is presented as a way to improve the speech quality. Found by an inspection on the pronunciation process of human beings, the consistency can be interpreted as a high correlation of a warping curve between the spectrum and the prosody intra a syllable. Through three steps in the procedure of the consistency analysis, the HMM algorithm is used firstly to decode HMM-state sequences within a syllable at the same time as to divide them into three segments. Secondly, based on a designated syllable, the vector quantization (VQ) with the Linde-Buzo-Gray (LBG) algorithm is used to train the VQ codebooks of each segment. Thirdly, the prosodic vector of each segment is encoded as an index by VQ codebooks, and then the probability of each possible path is evaluated as a prerequisite to analyze the consistency. It is demonstrated experimentally that a consistency is definitely acquired in case the syllable is located exactly in the same word. These results offer a research direction that the warping process between the spectrum and the prosody intra a syllable must be considered in a TTS system to improve the speech quality.