HMM-based emotional speech synthesis using average emotion model

Authors:
Long Qin;Zhen-Hua Ling;Yi-Jian Wu;Bu-Fan Zhang;Ren-Hua Wang
Affiliations:
iFLYTEK Speech Lab, University of Science and Technology of China, Hefei;iFLYTEK Speech Lab, University of Science and Technology of China, Hefei;iFLYTEK Speech Lab, University of Science and Technology of China, Hefei;iFLYTEK Speech Lab, University of Science and Technology of China, Hefei;iFLYTEK Speech Lab, University of Science and Technology of China, Hefei
Venue:
ISCSLP'06 Proceedings of the 5th international conference on Chinese Spoken Language Processing
Year:
2006

Citing 4
Cited 1

Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds

Speech Communication
Acoustic Modeling of Speaking Styles and Emotional Expressions in HMM-Based Speech Synthesis

IEICE - Transactions on Information and Systems
Speech synthesis using HMMs with dynamic features

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Hidden Markov models based on multi-space probability distribution for pitch pattern modeling

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01

Robust speaker-adaptive HMM-based text-to-speech synthesis

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a technique for synthesizing emotional speech based on an emotion-independent model which is called “average emotion” model. The average emotion model is trained using a multi-emotion speech database. Applying a MLLR-based model adaptation method, we can transform the average emotion model to present the target emotion which is not included in the training data. A multi-emotion speech database including four emotions, “neutral”, “happiness”, “sadness”, and “anger”, is used in our experiment. The results of subjective tests show that the average emotion model can effectively synthesize neutral speech and can be adapted to the target emotion model using very limited training data.