Trainable videorealistic speech animation

  • Authors:
  • Tony Ezzat;Gadi Geiger;Tomaso Poggio

  • Affiliations:
  • Center for Biological and Computational Learning, Massachusetts Institute of Technology, Cambridge, MA;Center for Biological and Computational Learning, Massachusetts Institute of Technology, Cambridge, MA;Center for Biological and Computational Learning, Massachusetts Institute of Technology, Cambridge, MA

  • Venue:
  • FGR' 04 Proceedings of the Sixth IEEE international conference on Automatic face and gesture recognition
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe how to create with machine learning techniques a generative, videorealistic, speech animation module. A human subject is first recorded using a videocamera as he/she utters a pre-determined speech corpus. After processing the corpus automatically, a visual speech module is learned from the data that is capable of synthesizing the human subject's mouth uttering entirely novel utterances that were not recorded in the original video. The synthesized utterance is re-composited onto a background sequence which contains natural head and eye movement. The final output is videorealistic in the sense that it looks like a video camera recording of the subject. At run time, the input to the system can be either real audio sequences or synthetic audio produced by a text-to-speech system, as long as they have been phonetically aligned.