Conversational speech synthesis and the need for some laughter

Authors:
N. Campbell
Affiliations:
Commun. Technol. & the Speech & Acoust. Process. Dept., Nat. Inst. of Inf., Kyoto
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2006

Citing 0
Cited 5

Social signal processing: state-of-the-art and future perspectives of an emerging domain

MM '08 Proceedings of the 16th ACM international conference on Multimedia
Social signal processing: Survey of an emerging domain

Image and Vision Computing
On the use of nonverbal speech sounds in human communication

COST 2102'07 Proceedings of the 2007 COST action 2102 international conference on Verbal and nonverbal communication behaviours
Phone set selection for HMM-based dialect speech synthesis

DIALECTS '11 Proceedings of the First Workshop on Algorithms and Resources for Modelling of Dialects and Language Varieties
Expressive speech synthesis: a review

International Journal of Speech Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper reports progress in the synthesis of conversational speech, from the viewpoint of work carried out on the analysis of a very large corpus of expressive speech in normal everyday situations. With recent developments in concatenative techniques, speech synthesis has overcome the barrier of realistically portraying extra-linguistic information by using the actual voice of a recognizable person as a source for units, combined with minimal use of signal processing. However, the technology still faces the problem of expressing paralinguistic information, i.e., the variety in the types of speech and laughter that a person might use in everyday social interactions. Paralinguistic modification of an utterance portrays the speaker's affective states and shows his or her relationships with the speaker through variations in the manner of speaking, by means of prosody and voice quality. These inflections are carried on the propositional content of an utterance, and can perhaps be modeled by rule, but they are also expressed through nonverbal utterances, the complexity of which may be beyond the capabilities of many current synthesis methods. We suggest that this problem may be solved by the use of phrase-sized utterance units taken intact from a large corpus