Applying an analysis of acted vocal emotions to improve the simulation of synthetic speech

Authors:
Iain R. Murray;John L. Arnott
Affiliations:
School of Applied Computing, University of Dundee, Dundee DD1 4HN, United Kingdom;School of Applied Computing, University of Dundee, Dundee DD1 4HN, United Kingdom
Venue:
Computer Speech and Language
Year:
2008

Citing 4
Cited 12

From text to speech: the MITalk system

From text to speech: the MITalk system
Implementation and testing of a system for producing emotion-by-rule in synthetic speech

Speech Communication
Emotional speech: towards a new generation of databases

Speech Communication - Special issue on speech and emotion
The role of voice quality in communicating emotion, mood and attitude

Speech Communication - Special issue on speech and emotion

Speech Emotion Recognition System Based on BP Neural Network in Matlab Environment

ISNN '08 Proceedings of the 5th international symposium on Neural Networks: Advances in Neural Networks, Part II
Spectrum Modification for Emotional Speech Synthesis

Multimodal Signals: Cognitive and Algorithmic Issues
Emotional speech synthesis by XML file using interactive genetic algorithms

Proceedings of the first ACM/SIGEVO Summit on Genetic and Evolutionary Computation
Speech emotion recognition research based on wavelet neural network for robot pet

ICIC'09 Proceedings of the Intelligent computing 5th international conference on Emerging intelligent computing technology and applications
Expression of affect in spontaneous speech: Acoustic correlates and automatic detection of irritation and resignation

Computer Speech and Language
Designing the emotional content of a robotic speech signal

Proceedings of the 5th Audio Mostly Conference: A Conference on Interaction with Sound
Spoken emotion recognition using hierarchical classifiers

Computer Speech and Language
Segment-based emotion recognition from continuous Mandarin Chinese speech

Computers in Human Behavior
The role of voice quality and prosodic contour in affective speech perception

Speech Communication
Classification of emotional speech using 3DEC hierarchical classifier

Speech Communication
Fuzzy cognitive maps for artificial emotions forecasting

Applied Soft Computing
The CARES corpus: a database of older adult actor simulated emergency dialogue for developing a personal emergency response system

International Journal of Speech Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

All speech produced by humans includes information about the speaker, including conveying the emotional state of the speaker. It is thus desirable to include vocal affect in any synthetic speech where improving the naturalness of the speech produced is important. However, the speech factors which convey affect are poorly understood, and their implementation in synthetic speech systems is not yet commonplace. A prototype system for the production of emotional synthetic speech using a commercial formant synthesiser was developed based on vocal emotion descriptions given in the literature. This paper describes work to improve and augment this system, based on a detailed investigation of emotive material spoken by two actors (one amateur, one professional). The results of this analysis are summarised, and were used to enhance the existing emotion rules used in the speech synthesis system. The enhanced system was evaluated by naive listeners in a perception experiment, and the simulated emotions were found to be more realistic than in the original version of the system.