Selecting non-uniform units from a very large corpus for concatenative speech synthesizer
ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 200. on IEEE International Conference - Volume 02
An adaptive algorithm for mel-cepstral analysis of speech
ICASSP'92 Proceedings of the 1992 IEEE international conference on Acoustics, speech and signal processing - Volume 1
HMM-Based Speech Synthesis for the Greek Language
TSD '08 Proceedings of the 11th international conference on Text, Speech and Dialogue
Review: Statistical parametric speech synthesis
Speech Communication
Embedment of 3D virtual human into webpages for visual speech synthesis purpose
VECIMS'09 Proceedings of the 2009 IEEE international conference on Virtual Environments, Human-Computer Interfaces and Measurement Systems
Enrich web applications with voice internet persona text-to-speech for anyone, anywhere
HCI'07 Proceedings of the 12th international conference on Human-computer interaction: intelligent multimodal interaction environments
IEEE Transactions on Audio, Speech, and Language Processing
Hi-index | 0.00 |
In this paper we present our Hidden Markov Model (HMM)-based, Mandarin Chinese Text-to-Speech (TTS) system. Mandarin Chinese or Putonghua, “the common spoken language”, is a tone language where each of the 400 plus base syllables can have up to 5 different lexical tone patterns. Their segmental and supra-segmental information is first modeled by 3 corresponding HMMs, including: (1) spectral envelop and gain; (2) voiced/unvoiced and fundamental frequency; and (3) segment duration. The corresponding HMMs are trained from a read speech database of 1,000 sentences recorded by a female speaker. Specifically, the spectral information is derived from short-time LPC spectral analysis. Among all LPC parameters, Line Spectrum Pair (LSP) has the closest relevance to the natural resonances or the “formants” of a speech sound and it is selected to parameterize the spectral information. Furthermore, the property of clustered LSPs around a spectral peak justify augmenting LSPs with their dynamic counterparts, both in time and frequency, in both HMM modeling and parameter trajectory synthesis. One hundred sentences synthesized by 4 LSP-based systems have been subjectively evaluated with an AB comparison test. The listening test results show that LSP and its dynamic counterpart, both in time and frequency, are preferred for the resultant higher synthesized speech quality.