Application of Expressive Speech in TTS System with Cepstral Description

Authors:
Jiří Přibil;Anna Přibilová
Affiliations:
Institute of Photonics and Electronics, Academy of Sciences CR, v.v.i., Prague 8, Czech Republic CZ-182 51;Faculty of Electrical Engineering & Information Technology, Dept. of Radio Electronics, Slovak University of Technology, Bratislava, Slovakia SK-812 19
Venue:
Verbal and Nonverbal Features of Human-Human and Human-Machine Interaction
Year:
2008

Citing 7
Cited 1

A corpus-based speech synthesis system with emotion

Speech Communication - Special issue on speech and emotion
Non-linear frequency scale mapping for voice conversion in text-to-speech system with cepstral description

Speech Communication
The significance of empty speech pauses: cognitive and algorithmic issues

BVAI'07 Proceedings of the 2nd international conference on Advances in brain, vision and artificial intelligence
Emotional style conversion in the TTS system with cepstral description

COST 2102'07 Proceedings of the 2007 COST action 2102 international conference on Verbal and nonverbal communication behaviours
An objective and subjective study of the role of semantics and prosodic features in building corpora for emotional TTS

IEEE Transactions on Audio, Speech, and Language Processing
Generating expressive speech for storytelling applications

IEEE Transactions on Audio, Speech, and Language Processing
Prosody conversion from neutral speech to emotional speech

IEEE Transactions on Audio, Speech, and Language Processing

Microintonation analysis of emotional speech

COST'09 Proceedings of the Second international conference on Development of Multimodal Interfaces: active Listening and Synchrony

Quantified Score

Hi-index	0.00

Visualization

Abstract

Expressive speech synthesis representing different human emotions has been in the interests of researchers for a longer time. Recently, some experiments with storytelling speaking style have been performed. This particular speaking style is suitable for applications aimed at children as well as special applications aimed at blind people. Analyzing human storytellers' speech, we designed a set of prosodic parameters prototypes for converting speech produced by the text-to-speech (TTS) system into storytelling speech. In addition to suprasegmental characteristics (pitch, intensity, and duration) included in these speech prototypes, also information about significant frequencies of spectral envelope and spectral flatness determining degree of voicing was used.