Applying an analysis of acted vocal emotions to improve the simulation of synthetic speech

  • Authors:
  • Iain R. Murray;John L. Arnott

  • Affiliations:
  • School of Applied Computing, University of Dundee, Dundee DD1 4HN, United Kingdom;School of Applied Computing, University of Dundee, Dundee DD1 4HN, United Kingdom

  • Venue:
  • Computer Speech and Language
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

All speech produced by humans includes information about the speaker, including conveying the emotional state of the speaker. It is thus desirable to include vocal affect in any synthetic speech where improving the naturalness of the speech produced is important. However, the speech factors which convey affect are poorly understood, and their implementation in synthetic speech systems is not yet commonplace. A prototype system for the production of emotional synthetic speech using a commercial formant synthesiser was developed based on vocal emotion descriptions given in the literature. This paper describes work to improve and augment this system, based on a detailed investigation of emotive material spoken by two actors (one amateur, one professional). The results of this analysis are summarised, and were used to enhance the existing emotion rules used in the speech synthesis system. The enhanced system was evaluated by naive listeners in a perception experiment, and the simulated emotions were found to be more realistic than in the original version of the system.