EURASIP Journal on Audio, Speech, and Music Processing
Objective and subjective evaluation of an expressive speech corpus
NOLISP'07 Proceedings of the 2007 international conference on Advances in nonlinear speech processing
Candidacy of physiological measurements for implicit control of emotional speech synthesis
ACII'11 Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part II
Expressive speech synthesis: a review
International Journal of Speech Technology
Hi-index | 0.00 |
This paper describes the special demands of conversational speech in the context of corpus-based speech synthesis. The author proposed the CHATR system of prosody-based unit-selection for concatenative waveform synthesis seven years ago, and now extends this work to incorporate the results of an analysis of five-years of recordings of spontaneous conversational speeech in a wide range of actual daily-life situations. The paper proposes that the expresion of affect (often translated as 'kansei' in Japanese) is the main factor differentiating laboratory speech from realworld conversational speech, and presents a framework for the specification of affect through differences in speaking style and voice quality. Having an enormous corpus of speech samples available for concatenation allows the selection of complete phrase-sized utterance segments, and changes the focus of unit selection from segmental or phonetic continuity to one of prosodic and discoursal appropriateness instead. Samples of the resulting large-corpus-based synthesis can be heard at http://feast.his.atr.jp/AESOP.