Vocal communication of emotion: a review of research paradigms
Speech Communication - Special issue on speech and emotion
IEICE - Transactions on Information and Systems
A Style Adaptation Technique for Speech Synthesis Using HSMM and Suprasegmental Features
IEICE - Transactions on Information and Systems
Details of the Nitech HMM-Based Speech Synthesis System for the Blizzard Challenge 2005
IEICE - Transactions on Information and Systems
Multisyn: Open-domain unit selection for the Festival speech synthesis system
Speech Communication
Unit selection in a concatenative speech synthesis system using a large speech database
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis
IEICE - Transactions on Information and Systems
A Hidden Semi-Markov Model-Based Speech Synthesis System
IEICE - Transactions on Information and Systems
A Style Control Technique for HMM-Based Expressive Speech Synthesis
IEICE - Transactions on Information and Systems
Review: Statistical parametric speech synthesis
Speech Communication
Robust speaker-adaptive HMM-based text-to-speech synthesis
IEEE Transactions on Audio, Speech, and Language Processing
The IBM expressive text-to-speech synthesis system for American English
IEEE Transactions on Audio, Speech, and Language Processing
LSESpeak: A spoken language generator for Deaf people
Expert Systems with Applications: An International Journal
I feel you: towards affect-sensitive domotic spoken conversational agents
IWAAL'12 Proceedings of the 4th international conference on Ambient Assisted Living and Home Care
Expressive speech synthesis: a review
International Journal of Speech Technology
Hi-index | 0.00 |
We have applied two state-of-the-art speech synthesis techniques (unit selection and HMM-based synthesis) to the synthesis of emotional speech. A series of carefully designed perceptual tests to evaluate speech quality, emotion identification rates and emotional strength were used for the six emotions which we recorded -happiness, sadness, anger, surprise, fear, disgust. For the HMM-based method, we evaluated spectral and source components separately and identified which components contribute to which emotion. Our analysis shows that, although the HMM method produces significantly better neutral speech, the two methods produce emotional speech of similar quality, except for emotions having context-dependent prosodic patterns. Whilst synthetic speech produced using the unit selection method has better emotional strength scores than the HMM-based method, the HMM-based method has the ability to manipulate the emotional strength. For emotions that are characterized by both spectral and prosodic components, synthetic speech using unit selection methods was more accurately identified by listeners. For emotions mainly characterized by prosodic components, HMM-based synthetic speech was more accurately identified. This finding differs from previous results regarding listener judgements of speaker similarity for neutral speech. We conclude that unit selection methods require improvements to prosodic modeling and that HMM-based methods require improvements to spectral modeling for emotional speech. Certain emotions cannot be reproduced well by either method.