Photorealistic 2D audiovisual text-to-speech synthesis using active appearance models

  • Authors:
  • Wesley Mattheyses;Werner Verhelst

  • Affiliations:
  • Vrije Universiteit Brussel, Brussels, Belgium;Vrije Universiteit Brussel, Brussels, Belgium

  • Venue:
  • Proceedings of the SSPNET 2nd International Symposium on Facial Analysis and Animation
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Audiovisual text-to-speech (AVTTS) synthesizers are capable of generating a synthetic audiovisual speech signal based on an input text. A possible approach to achieve this is model-based synthesis, where the talking head consists of a 3D model of which the polygons are varied in accordance with the target speech. In contrast with these rule-based systems, data-driven synthesizers create the target speech by reusing pre-recorded natural speech samples. The system we developed at the Vrije Universiteit Brussel is a data-based 2D photorealistic synthesizer that is able to create a synthetic visual speech signal that is similar to standard 'newsreader-style' television recordings.