Intonation modelling and adaptation for emotional prosody generation

Authors:
Zeynep Inanoglu;Steve Young
Affiliations:
Cambridge University Engineering Department, Machine Intelligence Laboratory, Cambridge, UK;Cambridge University Engineering Department, Machine Intelligence Laboratory, Cambridge, UK
Venue:
ACII'05 Proceedings of the First international conference on Affective Computing and Intelligent Interaction
Year:
2005

Citing 2
Cited 1

Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones

Speech Communication
A corpus-based speech synthesis system with emotion

Speech Communication - Special issue on speech and emotion

Data-driven emotion conversion in spoken English

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes an HMM-based approach to generating emotional intonation patterns. A set of models were built to represent syllable-length intonation units. In a classification framework, the models were able to detect a sequence of intonation units from raw fundamental frequency values. Using the models in a generative framework, we were able to synthesize smooth and natural sounding pitch contours. As a case study for emotional intonation generation, Maximum Likelihood Linear Regression (MLLR) adaptation was used to transform the neutral model parameters with a small amount of happy and sad speech data. Perceptual tests showed that listeners could identify the speech with the sad intonation 80% of the time. On the other hand, listeners formed a bimodal distribution in their ability to detect the system generated happy intontation and on average listeners were able to detect happy intonation only 46% of the time.