Photorealistic 2D audiovisual text-to-speech synthesis using active appearance models

Authors:
Wesley Mattheyses;Werner Verhelst
Affiliations:
Vrije Universiteit Brussel, Brussels, Belgium;Vrije Universiteit Brussel, Brussels, Belgium
Venue:
Proceedings of the SSPNET 2nd International Symposium on Facial Analysis and Animation
Year:
2010

Citing 3
Cited 0

Interpreting Face Images Using Active Appearance Models

FG '98 Proceedings of the 3rd. International Conference on Face & Gesture Recognition
Unit selection in a concatenative speech synthesis system using a large speech database

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
On the importance of audiovisual coherence for the perceived quality of synthesized visual speech

EURASIP Journal on Audio, Speech, and Music Processing - Special issue on animating virtual speakers or singers from audio: Lip-synching facial animation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Audiovisual text-to-speech (AVTTS) synthesizers are capable of generating a synthetic audiovisual speech signal based on an input text. A possible approach to achieve this is model-based synthesis, where the talking head consists of a 3D model of which the polygons are varied in accordance with the target speech. In contrast with these rule-based systems, data-driven synthesizers create the target speech by reusing pre-recorded natural speech samples. The system we developed at the Vrije Universiteit Brussel is a data-based 2D photorealistic synthesizer that is able to create a synthetic visual speech signal that is similar to standard 'newsreader-style' television recordings.