Synthesizing a talking mouth

  • Authors:
  • Ziheng Zhou;Guoying Zhao;Matti Pietikäinen

  • Affiliations:
  • University of Oulu, Oulu, Finland;University of Oulu, Oulu, Finland;University of Oulu, Oulu, Finland

  • Venue:
  • Proceedings of the Seventh Indian Conference on Computer Vision, Graphics and Image Processing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a visually realistic animation system for synthesizing a talking mouth. Video synthesis is achieved by first learning generative models from the recorded speech videos and then using the learned models to generate videos for novel utterances. A generative model considers the whole utterance contained in a video as a continuous process and represents it using a set of trigonometric functions embedded within a path graph. The transformation that projects the values of the functions to the image space is found through graph embedding. Such a model allows us to synthesize mouth images at arbitrary positions in the utterance. To synthesize a video for a novel utterance, the utterance is first compared with the existing ones from which we find the phoneme combinations that best approximate the utterance. Based on the learned models, dense videos are synthesized, concatenated and downsampled. A new generative model is then built on the remaining image samples for the final video synthesis.