An intuitive style control technique in HMM-based expressive speech synthesis using subjective style intensity and multiple-regression global variance model

Authors:
Takashi Nose;Takao Kobayashi
Affiliations:
Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology, Yokohama 226-8502, Japan;Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology, Yokohama 226-8502, Japan
Venue:
Speech Communication
Year:
2013

Citing 11
Cited 1

Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds

Speech Communication
Describing the emotional states that are expressed in speech

Speech Communication - Special issue on speech and emotion
A corpus-based speech synthesis system with emotion

Speech Communication - Special issue on speech and emotion
Hidden Markov model-based speech emotion recognition

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 2
Speech Synthesis with Various Emotional Expressions and Speaking Styles by Style Interpolation and Morphing

IEICE - Transactions on Information and Systems
Human Walking Motion Synthesis with Desired Pace and Stride Length Based on HSMM

IEICE - Transactions on Information and Systems
A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis

IEICE - Transactions on Information and Systems
A Style Control Technique for HMM-Based Expressive Speech Synthesis

IEICE - Transactions on Information and Systems
Review: Statistical parametric speech synthesis

Speech Communication
Context adaptive training with factorized decision trees for HMM-based statistical parametric speech synthesis

Speech Communication
The IBM expressive text-to-speech synthesis system for American English

IEEE Transactions on Audio, Speech, and Language Processing

Prosodic variation enhancement using unsupervised context labeling for HMM-based expressive speech synthesis

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

To control intuitively the intensities of emotional expressions and speaking styles for synthetic speech, we introduce subjective style intensities and multiple-regression global variance (MRGV) models into hidden Markov model (HMM)-based expressive speech synthesis. A problem in the conventional parametric style modeling and style control techniques is that the intensities of styles appearing in synthetic speech strongly depend on the training data. To alleviate this problem, the proposed technique explicitly takes into account subjective style intensities perceived for respective training utterances using multiple-regression hidden semi-Markov models (MRHSMMs). As a result, synthetic speech becomes less sensitive to the variation of style expressivity existing in the training data. Another problem is that the synthetic speech generally suffers from the over-smoothing effect of model parameters in the model training, so the variance of the generated speech parameter trajectory becomes smaller than that of the natural speech. To alleviate this problem for the case of style control, we extend the conventional variance compensation method based on a GV model for a single-style speech to the case of multiple styles with variable style intensities by deriving the MRGV modeling. The objective and subjective experimental results show that these two techniques significantly enhance the intuitive style control of synthetic speech, which is essential for the speech synthesis system to communicate para-linguistic information correctly to the listeners.