Synthesis and evaluation of conversational characteristics in HMM-based speech synthesis

Authors:
Sebastian Andersson;Junichi Yamagishi;Robert A. J. Clark
Affiliations:
The Centre for Speech Technology Research, University of Edinburgh, Informatics Forum, 10 Crichton Street, Edinburgh EH8 9AB, UK;The Centre for Speech Technology Research, University of Edinburgh, Informatics Forum, 10 Crichton Street, Edinburgh EH8 9AB, UK;The Centre for Speech Technology Research, University of Edinburgh, Informatics Forum, 10 Crichton Street, Edinburgh EH8 9AB, UK
Venue:
Speech Communication
Year:
2012

Citing 8
Cited 1

The contribution of prosodic boundary markers to the perceptual difference between read and spontaneous speech

Speech Communication
The contribution of intonation, segmental durations, and spectral features to the perception of a spontaneous and a read speaking style

Speech Communication
Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds

Speech Communication
Acoustic Modeling of Speaking Styles and Emotional Expressions in HMM-Based Speech Synthesis

IEICE - Transactions on Information and Systems
Details of the Nitech HMM-Based Speech Synthesis System for the Blizzard Challenge 2005

IEICE - Transactions on Information and Systems
Differences between acoustic characteristics of spontaneous and read speech and their effects on speech recognition performance

Computer Speech and Language
Review: Statistical parametric speech synthesis

Speech Communication
Recognition of hesitations in spontaneous speech

ICASSP'92 Proceedings of the 1992 IEEE international conference on Acoustics, speech and signal processing - Volume 1

Neurodynamical top-down processing during auditory attention

ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part II

Quantified Score

Hi-index	0.01

Visualization

Abstract

Spontaneous conversational speech has many characteristics that are currently not modelled well by HMM-based speech synthesis and in order to build synthetic voices that can give an impression of someone partaking in a conversation, we need to utilise data that exhibits more of the speech phenomena associated with conversations than the more generally used carefully read aloud sentences. In this paper we show that synthetic voices built with HMM-based speech synthesis techniques from conversational speech data, preserved segmental and prosodic characteristics of frequent conversational speech phenomena. An analysis of an evaluation investigating the perception of quality and speaking style of HMM-based voices confirms that speech with conversational characteristics are instrumental for listeners to perceive successful integration of conversational speech phenomena in synthetic speech. The achieved synthetic speech quality provides an encouraging start for the continued use of conversational speech in HMM-based speech synthesis.