Photo-realistic talking-heads from image samples

Authors:
E. Cosatto;H. P. Graf
Affiliations:
AT&T Bell Labs.-Res., Red Bank, NJ;-
Venue:
IEEE Transactions on Multimedia
Year:
2000

Citing 0
Cited 19

Creating Interactive Virtual Humans: Some Assembly Required

IEEE Intelligent Systems
E-Partner: A Photo-Realistic Conversation Agent

PCM '01 Proceedings of the Second IEEE Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing
Photo-realistic conversation agent

Integrated image and graphics technologies
Towards perceptually realistic talking heads: models, methods and McGurk

APGV '04 Proceedings of the 1st Symposium on Applied perception in graphics and visualization
Surface detail capturing for realistic facial animation

Journal of Computer Science and Technology - Special issue on computer graphics and computer-aided design
Video-based character animation

Proceedings of the 2005 ACM SIGGRAPH/Eurographics symposium on Computer animation
Transferable videorealistic speech animation

Proceedings of the 2005 ACM SIGGRAPH/Eurographics symposium on Computer animation
Toward Perceptually Realistic Talking Heads: Models, Methods, and McGurk

ACM Transactions on Applied Perception (TAP)
Data Fusion and Multicue Data Matching by Diffusion Maps

IEEE Transactions on Pattern Analysis and Machine Intelligence
Facial animation in a nutshell: past, present and future

SAICSIT '06 Proceedings of the 2006 annual research conference of the South African institute of computer scientists and information technologists on IT research in developing countries
MATCHKiosk: a multimodal interactive city guide

ACLdemo '04 Proceedings of the ACL 2004 on Interactive poster and demonstration sessions
A coupled HMM approach to video-realistic speech animation

Pattern Recognition
Audio-visual speech processing: progress and challenges

VisHCI '06 Proceedings of the HCSNet workshop on Use of vision in human-computer interaction - Volume 56
Multimodal Unit Selection for 2D Audiovisual Text-to-Speech Synthesis

MLMI '08 Proceedings of the 5th international workshop on Machine Learning for Multimodal Interaction
On the importance of audiovisual coherence for the perceived quality of synthesized visual speech

EURASIP Journal on Audio, Speech, and Music Processing - Special issue on animating virtual speakers or singers from audio: Lip-synching facial animation
Optimization of an image-based talking head system

EURASIP Journal on Audio, Speech, and Music Processing - Special issue on animating virtual speakers or singers from audio: Lip-synching facial animation
Synthesizing a talking mouth

Proceedings of the Seventh Indian Conference on Computer Vision, Graphics and Image Processing
Dynamic units of visual speech

EUROSCA'12 Proceedings of the 11th ACM SIGGRAPH / Eurographics conference on Computer Animation
Dynamic units of visual speech

Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a system for creating a photo-realistic model of the human head that can be animated and lip-synched from phonetic transcripts of text. Combined with a state-of-the-art text-to-speech synthesizer (TTS), it generates video animations of talking heads that closely resemble real people. To obtain a naturally looking head, we choose a “data-driven” approach. We record a talking person and apply image recognition to extract automatically bitmaps of facial parts. These bitmaps are normalized and parameterized before being entered into a database. For synthesis, the TTS provides the audio track, as well as the phonetic transcript from which trajectories in the space of parameterized bitmaps are computed for all facial parts. Sampling these trajectories and retrieving the corresponding bitmaps from the database produces animated facial parts. These facial parts are then projected and blended onto an image of the whole head using its pose information. This talking head model can produce new never recorded speech of the person who was originally recorded. Talking-head animations of this type are useful as a front-end for agents and avatars in multimedia applications such as virtual operators, virtual announcers, help desks, educational, and expert systems