A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis

Authors:
Tomoki Toda;Keiichi Tokuda
Affiliations:
-;-
Venue:
IEICE - Transactions on Information and Systems
Year:
2007

Citing 6
Cited 20

Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds

Speech Communication
Incorporating a mixed excitation model and postfilter into HMM-based text-to-speech synthesis

Systems and Computers in Japan
Concatenative Speech Synthesis Based on the Plural Unit Selection and Fusion Method*Part of this work has already been presented in [1]. This paper develops it in much greater detail.

IEICE - Transactions on Information and Systems
Generating F0 Contours by Statistical Manipulation of Natural F0 Shapes

IEICE - Transactions on Information and Systems
Unit selection in a concatenative speech synthesis system using a large speech database

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Selecting non-uniform units from a very large corpus for concatenative speech synthesizer

ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 200. on IEEE International Conference - Volume 02

The Nitech-NAIST HMM-Based Speech Synthesis System for the Blizzard Challenge 2006

IEICE - Transactions on Information and Systems
Review: Statistical parametric speech synthesis

Speech Communication
Robust speaker-adaptive HMM-based text-to-speech synthesis

IEEE Transactions on Audio, Speech, and Language Processing
Modeling and interpolation of Austrian German and Viennese dialect in HMM-based speech synthesis

Speech Communication
Analysis of statistical parametric and unit selection speech synthesis systems applied to emotional speech

Speech Communication
Mixing HMM-based spanish speech synthesis with a CBR for prosody estimation

NOLISP'07 Proceedings of the 2007 international conference on Advances in nonlinear speech processing
Thousands of voices for HMM-based speech synthesis: analysis and application of TTS systems built on various ASR corpora

IEEE Transactions on Audio, Speech, and Language Processing
Statistical text-to-speech synthesis based on segment-wise representation with a norm constraint

IEEE Transactions on Audio, Speech, and Language Processing
Context adaptive training with factorized decision trees for HMM-based statistical parametric speech synthesis

Speech Communication
Speaker-independent HMM-based voice conversion using adaptive quantization of the fundamental frequency

Speech Communication
Very low bit-rate F0 coding for phonetic vocoders using MSD-HMM with quantized F0 symbols

Speech Communication
High quality emotional HMM-Based synthesis in spanish

NOLISP'09 Proceedings of the 2009 international conference on Advances in Nonlinear Speech Processing
Analysis of unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis using KLD-based transform mapping

Speech Communication
INPRO_iSS: a component for just-in-time incremental speech synthesis

ACL '12 Proceedings of the ACL 2012 System Demonstrations
Combining incremental language generation and incremental speech synthesis for adaptive information presentation

SIGDIAL '12 Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue
An intuitive style control technique in HMM-based expressive speech synthesis using subjective style intensity and multiple-regression global variance model

Speech Communication
Evaluating the intelligibility benefit of speech modifications in known noise conditions

Speech Communication
Complex cepstrum for statistical parametric speech synthesis

Speech Communication
Synthesis and perception of breathy, normal, and Lombard speech in the presence of noise

Computer Speech and Language
Intelligibility enhancement of HMM-generated speech in additive noise by modifying Mel cepstral coefficients to increase the glimpse proportion

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a novel parameter generation algorithm for an HMM-based speech synthesis technique. The conventional algorithm generates a parameter trajectory of static features that maximizes the likelihood of a given HMM for the parameter sequence consisting of the static and dynamic features under an explicit constraint between those two features. The generated trajectory is often excessively smoothed due to the statistical processing. Using the over-smoothed speech parameters usually causes muffled sounds. In order to alleviate the over-smoothing effect, we propose a generation algorithm considering not only the HMM likelihood maximized in the conventional algorithm but also a likelihood for a global variance (GV) of the generated trajectory. The latter likelihood works as a penalty for the over-smoothing, i.e., a reduction of the GV of the generated trajectory. The result of a perceptual evaluation demonstrates that the proposed algorithm causes considerably large improvements in the naturalness of synthetic speech.