Articulatory Speech Re-synthesis: Profiting from Natural Acoustic Speech Data

Authors:
Dominik Bauer;Jim Kannampuzha;Bernd J. Kröger
Affiliations:
Department of Phoniatrics, Pedaudiology, and Communication Disorders, University Hospital Aachen and RWTH Aachen University, Aachen, Germany;Department of Phoniatrics, Pedaudiology, and Communication Disorders, University Hospital Aachen and RWTH Aachen University, Aachen, Germany;Department of Phoniatrics, Pedaudiology, and Communication Disorders, University Hospital Aachen and RWTH Aachen University, Aachen, Germany
Venue:
Cross-Modal Analysis of Speech, Gestures, Gaze and Facial Expressions
Year:
2009

Citing 0
Cited 2

Gesture duration and articulator velocity in plosive-vowel-transitions

COST'09 Proceedings of the Second international conference on Development of Multimodal Interfaces: active Listening and Synchrony
Towards the acquisition of a sensorimotor vocal tract action repository within a neural model of speech processing

COST'10 Proceedings of the 2010 international conference on Analysis of Verbal and Nonverbal Communication and Enactment

Quantified Score

Hi-index	0.00

Visualization

Abstract

The quality of static phones (e.g. vowels, fricatives, nasals, laterals) generated by articulatory speech synthesizers has reached a high level in the last years. Our goal is to expand this high quality to dynamic speech, i.e. whole syllables, words, and utterances by re-synthesizing natural acoustic speech data. Re-synthesis means that vocal tract action units or articulatory gestures, describing the succession of speech movements, are adapted spatio-temporally with respect to a natural speech signal produced by a natural "model speaker" of Standard German. This adaptation is performed using the software tool SAGA (Sound and Articulatory Gesture Alignment) that is currently under development in our lab. The resulting action unit scores are stored in a database and serve as input for our articulatory speech synthesizer. This technique is designed to be the basis for a unit selection articulatory speech synthesis in the future.