Emotional style conversion in the TTS system with cepstral description

Authors:
Jiří Přibil;Anna Přibilová
Affiliations:
Institute of Photonics and Electronics, Academy of Sciences CR, v.v.i., Prague 8, Czech Republic;Slovak University of Technology, Faculty of Electrical Engineering & Information Technology, Dept. of Radio Electronics, Bratislava, Slovakia
Venue:
COST 2102'07 Proceedings of the 2007 COST action 2102 international conference on Verbal and nonverbal communication behaviours
Year:
2007

Citing 6
Cited 3

Implementation and testing of a system for producing emotion-by-rule in synthetic speech

Speech Communication
Emotional stress in synthetic speech: progress and future directions

Speech Communication - Special issue on speech under stress
Spoken Language Processing: A Guide to Theory, Algorithm, and System Development

Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
A corpus-based speech synthesis system with emotion

Speech Communication - Special issue on speech and emotion
Non-linear frequency scale mapping for voice conversion in text-to-speech system with cepstral description

Speech Communication
An objective and subjective study of the role of semantics and prosodic features in building corpora for emotional TTS

IEEE Transactions on Audio, Speech, and Language Processing

Application of Expressive Speech in TTS System with Cepstral Description

Verbal and Nonverbal Features of Human-Human and Human-Machine Interaction
Spectrum Modification for Emotional Speech Synthesis

Multimodal Signals: Cognitive and Algorithmic Issues
Microintonation analysis of emotional speech

COST'09 Proceedings of the Second international conference on Development of Multimodal Interfaces: active Listening and Synchrony

Quantified Score

Hi-index	0.00

Visualization

Abstract

This contribution describes experiments with emotional style conversion performed on the utterances produced by the Czech and Slovak textto-speech (TTS) system with cepstral description and basic prosody generated by rules. Emotional style conversion was realized as post-processing of the TTS output speech signal, and as a real-time implementation into the system. Emotional style prototypes representing three emotional states (sad, angry, and joyous) were obtained from the sentences with the same information content. The problem with the different frame length between the prototype and the target utterance was solved by linear time scale mapping (LTSM). The results were evaluated by a listening test of the resynthetized utterances.