Acoustic Modeling of Speaking Styles and Emotional Expressions in HMM-Based Speech Synthesis

Authors:
Junichi Yamagishi;Koji Onishi;Takashi Masuko;Takao Kobayashi
Affiliations:
The authors are with the Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology, Yokohama-shi, 226-8502 Japan. E-mail: junichi.yamagishi@ip.titech.ac.jp, E-mai ...;The authors are with the Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology, Yokohama-shi, 226-8502 Japan. E-mail: junichi.yamagishi@ip.titech.ac.jp, E-mai ...;The authors are with the Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology, Yokohama-shi, 226-8502 Japan. E-mail: junichi.yamagishi@ip.titech.ac.jp, E-mai ...;The authors are with the Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology, Yokohama-shi, 226-8502 Japan. E-mail: junichi.yamagishi@ip.titech.ac.jp, E-mai ...
Venue:
IEICE - Transactions on Information and Systems
Year:
2005

Citing 0
Cited 7

A Style Control Technique for HMM-Based Expressive Speech Synthesis

IEICE - Transactions on Information and Systems
Integrating articulatory features into HMM-based parametric speech synthesis

IEEE Transactions on Audio, Speech, and Language Processing
Robust speaker-adaptive HMM-based text-to-speech synthesis

IEEE Transactions on Audio, Speech, and Language Processing
Synthesis and evaluation of conversational characteristics in HMM-based speech synthesis

Speech Communication
HMM-based emotional speech synthesis using average emotion model

ISCSLP'06 Proceedings of the 5th international conference on Chinese Spoken Language Processing
LSESpeak: A spoken language generator for Deaf people

Expert Systems with Applications: An International Journal
Synthesis and perception of breathy, normal, and Lombard speech in the presence of noise

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes the modeling of various emotional expressions and speaking styles in synthetic speech using HMM-based speech synthesis. We show two methods for modeling speaking styles and emotional expressions. In the first method called style-dependent modeling, each speaking style and emotional expression is modeled individually. In the second one called style-mixed modeling, each speaking style and emotional expression is treated as one of contexts as well as phonetic, prosodic, and linguistic features, and all speaking styles and emotional expressions are modeled simultaneously by using a single acoustic model. We chose four styles of read speech --- neutral, rough, joyful, and sad --- and compared the above two modeling methods using these styles. The results of subjective evaluation tests show that both modeling methods have almost the same accuracy, and that it is possible to synthesize speech with the speaking style and emotional expression similar to those of the target speech. In a test of classification of styles in synthesized speech, more than 80% of speech samples generated using both the models were judged to be similar to the target styles. We also show that the style-mixed modeling method gives fewer output and duration distributions than the styledependent modeling method.