A Style Control Technique for HMM-Based Expressive Speech Synthesis

Authors:
Takashi Nose;Junichi Yamagishi;Takashi Masuko;Takao Kobayashi
Affiliations:
-;-;-;-
Venue:
IEICE - Transactions on Information and Systems
Year:
2007

Citing 10
Cited 6

Implementation and testing of a system for producing emotion-by-rule in synthetic speech

Speech Communication
Describing the emotional states that are expressed in speech

Speech Communication - Special issue on speech and emotion
A corpus-based speech synthesis system with emotion

Speech Communication - Special issue on speech and emotion
The role of voice quality in communicating emotion, mood and attitude

Speech Communication - Special issue on speech and emotion
Acoustic Modeling of Speaking Styles and Emotional Expressions in HMM-Based Speech Synthesis

IEICE - Transactions on Information and Systems
Speech Synthesis with Various Emotional Expressions and Speaking Styles by Style Interpolation and Morphing

IEICE - Transactions on Information and Systems
Human Walking Motion Synthesis with Desired Pace and Stride Length Based on HSMM

IEICE - Transactions on Information and Systems
A Style Adaptation Technique for Speech Synthesis Using HSMM and Suprasegmental Features

IEICE - Transactions on Information and Systems
The IBM expressive text-to-speech synthesis system for American English

IEEE Transactions on Audio, Speech, and Language Processing
Expressing degree of activation in synthetic speech

IEEE Transactions on Audio, Speech, and Language Processing

Review: Statistical parametric speech synthesis

Speech Communication
Integrating articulatory features into HMM-based parametric speech synthesis

IEEE Transactions on Audio, Speech, and Language Processing
Analysis of statistical parametric and unit selection speech synthesis systems applied to emotional speech

Speech Communication
An intuitive style control technique in HMM-based expressive speech synthesis using subjective style intensity and multiple-regression global variance model

Speech Communication
Expressive speech synthesis: a review

International Journal of Speech Technology
Synthesis and perception of breathy, normal, and Lombard speech in the presence of noise

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a technique for controlling the degree of expressivity of a desired emotional expression and/or speaking style of synthesized speech in an HMM-based speech synthesis framework. With this technique, multiple emotional expressions and speaking styles of speech are modeled in a single model by using a multiple-regression hidden semi-Markov model (MRHSMM). A set of control parameters, called the style vector, is defined, and each speech synthesis unit is modeled by using the MRHSMM, in which mean parameters of the state output and duration distributions are expressed by multiple-regression of the style vector. In the synthesis stage, the mean parameters of the synthesis units are modified by transforming an arbitrarily given style vector that corresponds to a point in a low-dimensional space, called style space, each of whose coordinates represents a certain specific speaking style or emotion of speech. The results of subjective evaluation tests show that style and its intensity can be controlled by changing the style vector.