Trainable videorealistic speech animation

Authors:
Tony Ezzat;Gadi Geiger;Tomaso Poggio
Affiliations:
Massachusetts Institute of Technology;Massachusetts Institute of Technology;Massachusetts Institute of Technology
Venue:
Proceedings of the 29th annual conference on Computer graphics and interactive techniques
Year:
2002

Citing 30
Cited 88

Speech and expression: a computer solution to face animation

Proceedings on Graphics Interface '86/Vision Interface '86
A muscle model for animation three-dimensional facial expression

SIGGRAPH '87 Proceedings of the 14th annual conference on Computer graphics and interactive techniques
Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones

Speech Communication
Feature-based image metamorphosis

SIGGRAPH '92 Proceedings of the 19th annual conference on Computer graphics and interactive techniques
View interpolation for image synthesis

SIGGRAPH '93 Proceedings of the 20th annual conference on Computer graphics and interactive techniques
Performance of optical flow techniques

International Journal of Computer Vision
Realistic modeling for facial animation

SIGGRAPH '95 Proceedings of the 22nd annual conference on Computer graphics and interactive techniques
Image metamorphosis using snakes and free-form deformations

SIGGRAPH '95 Proceedings of the 22nd annual conference on Computer graphics and interactive techniques
Video Rewrite: driving visual speech with audio

Proceedings of the 24th annual conference on Computer graphics and interactive techniques
Making faces

Proceedings of the 25th annual conference on Computer graphics and interactive techniques
Synthesizing realistic facial expressions from photographs

Proceedings of the 25th annual conference on Computer graphics and interactive techniques
EM algorithms for PCA and SPCA

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Mixtures of probabilistic principal component analyzers

Neural Computation
Voice puppetry

Proceedings of the 26th annual conference on Computer graphics and interactive techniques
A morphable model for the synthesis of 3D faces

Proceedings of the 26th annual conference on Computer graphics and interactive techniques
Robustly estimating changes in image appearance

Computer Vision and Image Understanding - Special issue on robusst statistical techniques in image understanding
Style machines

Proceedings of the 27th annual conference on Computer graphics and interactive techniques
Visual Speech Synthesis by Morphing Visemes

International Journal of Computer Vision - special issue on learning and vision at the center for biological and computational learning, Massachusetts Institute of Technology
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
Digital Image Warping

Digital Image Warping
Introduction to Algorithms

Introduction to Algorithms
Polymorph: Morphing Among Multiple Images

IEEE Computer Graphics and Applications
Hierarchical Model-Based Motion Estimation

ECCV '92 Proceedings of the Second European Conference on Computer Vision
Active Appearance Models

ECCV '98 Proceedings of the 5th European Conference on Computer Vision-Volume II - Volume II
Sample-Based Synthesis of Photo-Realistic Talking Heads

CA '98 Proceedings of the Computer Animation
Recognition and Structure from one 2D Model View: Observations on Prototypes, Object Classes and Symmetries

Recognition and Structure from one 2D Model View: Observations on Prototypes, Object Classes and Symmetries
Priors Stabilizers and Basis Functions: From Regularization to Radial, Tensor and Additive Splines

Priors Stabilizers and Basis Functions: From Regularization to Radial, Tensor and Additive Splines
Example Based Image Analysis and Synthesis

Example Based Image Analysis and Synthesis
A parametric model for human faces.

A parametric model for human faces.
Multidimensional Morphable Models

ICCV '98 Proceedings of the Sixth International Conference on Computer Vision

Creating Interactive Virtual Humans: Some Assembly Required

IEEE Intelligent Systems
Geometry-driven photorealistic facial expression synthesis

Proceedings of the 2003 ACM SIGGRAPH/Eurographics symposium on Computer animation
Vision-based control of 3D facial animation

Proceedings of the 2003 ACM SIGGRAPH/Eurographics symposium on Computer animation
Unsupervised learning for speech motion editing

Proceedings of the 2003 ACM SIGGRAPH/Eurographics symposium on Computer animation
Photo-realistic conversation agent

Integrated image and graphics technologies
Mimicking video: real-time morphable 3D model fitting

Proceedings of the ACM symposium on Virtual reality software and technology
Towards perceptually realistic talking heads: models, methods and McGurk

APGV '04 Proceedings of the 1st Symposium on Applied perception in graphics and visualization
Speaking with hands: creating animated conversational characters from recordings of human performance

ACM SIGGRAPH 2004 Papers
Spacetime faces: high resolution capture for modeling and animation

ACM SIGGRAPH 2004 Papers
Audio-based head motion synthesis for Avatar-based telepresence systems

Proceedings of the 2004 ACM SIGMM workshop on Effective telepresence
Emotional Chinese talking head system

Proceedings of the 6th international conference on Multimodal interfaces
Real-time speech motion synthesis from recorded motions

SCA '04 Proceedings of the 2004 ACM SIGGRAPH/Eurographics symposium on Computer animation
Surface detail capturing for realistic facial animation

Journal of Computer Science and Technology - Special issue on computer graphics and computer-aided design
Creating Speech-Synchronized Animation

IEEE Transactions on Visualization and Computer Graphics
Mood swings: expressive speech animation

ACM Transactions on Graphics (TOG)
Automatic determination of facial muscle activations from sparse motion capture marker data

ACM SIGGRAPH 2005 Papers
Texture optimization for example-based synthesis

ACM SIGGRAPH 2005 Papers
Video-based character animation

Proceedings of the 2005 ACM SIGGRAPH/Eurographics symposium on Computer animation
Transferable videorealistic speech animation

Proceedings of the 2005 ACM SIGGRAPH/Eurographics symposium on Computer animation
Toward Perceptually Realistic Talking Heads: Models, Methods, and McGurk

ACM Transactions on Applied Perception (TAP)
Psychophysical evaluation of animated facial expressions

APGV '05 Proceedings of the 2nd symposium on Applied perception in graphics and visualization
Expressive speech-driven facial animation

ACM Transactions on Graphics (TOG)
Geometry-Driven Photorealistic Facial Expression Synthesis

IEEE Transactions on Visualization and Computer Graphics
Accurate Visible Speech Synthesis Based on Concatenating Variable Length Motion Capture Data

IEEE Transactions on Visualization and Computer Graphics
Animating blendshape faces by cross-mapping motion capture data

I3D '06 Proceedings of the 2006 symposium on Interactive 3D graphics and games
Facial expressional image synthesis controlled by emotional parameters

Pattern Recognition Letters
Semantic 3D motion retargeting for facial animation

APGV '06 Proceedings of the 3rd symposium on Applied perception in graphics and visualization
Expressive Facial Animation Synthesis by Learning Speech Coarticulation and Expression Spaces

IEEE Transactions on Visualization and Computer Graphics
Key-frame removal method for blendshape-based cartoon lip-sync animation

ACM SIGGRAPH 2006 Research posters
Playable universal capture: compression and real-time sequencing of image-based facial animation

ACM SIGGRAPH 2006 Courses
Automatic head-movement control for emotional speech

SIGGRAPH '05 ACM SIGGRAPH 2005 Posters
Transferring of Speech Movements from Video to 3D Face Space

IEEE Transactions on Visualization and Computer Graphics
Joint Optimization of Word Alignment and Epenthesis Generation for Chinese to Taiwanese Sign Synthesis

IEEE Transactions on Pattern Analysis and Machine Intelligence
Facial animation in a nutshell: past, present and future

SAICSIT '06 Proceedings of the 2006 annual research conference of the South African institute of computer scientists and information technologists on IT research in developing countries
eFASE: expressive facial animation synthesis and editing with phoneme-isomap controls

Proceedings of the 2006 ACM SIGGRAPH/Eurographics symposium on Computer animation
Simulating speech with a physics-based facial muscle model

Proceedings of the 2006 ACM SIGGRAPH/Eurographics symposium on Computer animation
A coupled HMM approach to video-realistic speech animation

Pattern Recognition
Dynamic, expressive speech animation from a single mesh

SCA '07 Proceedings of the 2007 ACM SIGGRAPH/Eurographics symposium on Computer animation
Assembling an expressive facial animation system

Proceedings of the 2007 ACM SIGGRAPH symposium on Video games
Evaluating the perceptual realism of animated facial expressions

ACM Transactions on Applied Perception (TAP)
Variable rate speech animation synthesis

ACM SIGGRAPH 2007 posters
Intuitive quasi-eigen faces

Proceedings of the 5th international conference on Computer graphics and interactive techniques in Australia and Southeast Asia
Real-time expression cloning using appearance models

Proceedings of the 9th international conference on Multimodal interfaces
Technology and Digital Art: 3D scan-based animation techniques for Chinese opera facial expression documentation

Computers and Graphics
Interactive 3D facial expression posing through 2D portrait manipulation

GI '08 Proceedings of graphics interface 2008
Image and video for hearing impaired people

Journal on Image and Video Processing
Lips-sync 3D speech animation

ACM SIGGRAPH 2008 posters
Multimodal Unit Selection for 2D Audiovisual Text-to-Speech Synthesis

MLMI '08 Proceedings of the 5th international workshop on Machine Learning for Multimodal Interaction
Ekfrasis: A Formal Language for Representing and Generating Sequences of Facial Patterns for Studying Emotional Behavior

Verbal and Nonverbal Features of Human-Human and Human-Machine Interaction
Instant Casting Movie Theater: The Future Cast System

IEICE - Transactions on Information and Systems
“Dive into the Movie” Audience-Driven Immersive Experience in the Story

IEICE - Transactions on Information and Systems
Visio-lization: generating novel facial images

ACM SIGGRAPH 2009 papers
Realistic Face Animation for Audiovisual Speech Applications: A Densification Approach Driven by Sparse Stereo Meshes

MIRAGE '09 Proceedings of the 4th International Conference on Computer Vision/Computer Graphics CollaborationTechniques
Animating lip-sync speech faces by dominated animeme models

SIGGRAPH '09: Posters
Model-based synthesis of visual speech movements from 3D video

SIGGRAPH '09: Posters
Real-time prosody-driven synthesis of body language

ACM SIGGRAPH Asia 2009 papers
Perceptually guided expressive facial animation

Proceedings of the 2008 ACM SIGGRAPH/Eurographics Symposium on Computer Animation
On the importance of audiovisual coherence for the perceived quality of synthesized visual speech

EURASIP Journal on Audio, Speech, and Music Processing - Special issue on animating virtual speakers or singers from audio: Lip-synching facial animation
Optimization of an image-based talking head system

EURASIP Journal on Audio, Speech, and Music Processing - Special issue on animating virtual speakers or singers from audio: Lip-synching facial animation
Emphatic visual speech synthesis

IEEE Transactions on Audio, Speech, and Language Processing - Special issue on multimodal processing in speech-based interactions
Cultural Specific Effects on the Recognition of Basic Emotions: A Study on Italian Subjects

USAB '09 Proceedings of the 5th Symposium of the Workgroup Human-Computer Interaction and Usability Engineering of the Austrian Computer Society on HCI and Usability for e-Inclusion
Speech-Driven Facial Animation Using a Shared Gaussian Process Latent Variable Model

ISVC '09 Proceedings of the 5th International Symposium on Advances in Visual Computing: Part I
A nonparametric regression model for virtual humans generation

Multimedia Tools and Applications
SynFace: speech-driven facial animation for virtual speech-reading support

EURASIP Journal on Audio, Speech, and Music Processing - Special issue on animating virtual speakers or singers from audio: Lip-synching facial animation
Lip-synching using speaker-specific articulation, shape and appearance models

EURASIP Journal on Audio, Speech, and Music Processing - Special issue on animating virtual speakers or singers from audio: Lip-synching facial animation
Audio-visual identity verification: an introductory overview

Progress in nonlinear speech processing
VideoTRAN: a translation framework for audiovisual face-to-face conversations

COST 2102'07 Proceedings of the 2007 COST action 2102 international conference on Verbal and nonverbal communication behaviours
Audiovisual alignment in a face-to-face conversation translation framework

BioID_MultiComm'09 Proceedings of the 2009 joint COST 2101 and 2102 international conference on Biometric ID management and multimodal communication
Visyllable-specific facial transition motion embedding and extraction

ICIP'09 Proceedings of the 16th IEEE international conference on Image processing
Visual speech synthesis by modelling coarticulation dynamics using a non-parametric switching state-space model

International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction
Synthesizing a talking mouth

Proceedings of the Seventh Indian Conference on Computer Vision, Graphics and Image Processing
Trainable videorealistic speech animation

FGR' 04 Proceedings of the Sixth IEEE international conference on Automatic face and gesture recognition
Realistic multi-view face animation with aid of 3D PDM

FGR' 04 Proceedings of the Sixth IEEE international conference on Automatic face and gesture recognition
Active morphable model: an efficient method for face analysis

FGR' 04 Proceedings of the Sixth IEEE international conference on Automatic face and gesture recognition
Video face replacement

Proceedings of the 2011 SIGGRAPH Asia Conference
Instant movie casting with personality: Dive into the movie system

Proceedings of the 2011 international conference on Virtual and mixed reality: systems and applications - Volume Part II
Perceiving visual emotions with speech

IVA'06 Proceedings of the 6th international conference on Intelligent Virtual Agents
Vision based speech animation transferring with underlying anatomical structure

ACCV'06 Proceedings of the 7th Asian conference on Computer Vision - Volume Part I
An efficient approach for multi-view face animation based on quasi 3d model

ACCV'06 Proceedings of the 7th Asian conference on Computer Vision - Volume Part II
On speech and gestures synchrony

COST'10 Proceedings of the 2010 international conference on Analysis of Verbal and Nonverbal Communication and Enactment
Which facial profile do humans expect after seeing a frontal view? a comparison with a linear face model

ACM Transactions on Applied Perception (TAP)
Correct speech visemes as a root of total communication method for deaf people

KES-AMSTA'12 Proceedings of the 6th KES international conference on Agent and Multi-Agent Systems: technologies and applications
Animatable facial reflectance fields

EGSR'04 Proceedings of the Fifteenth Eurographics conference on Rendering Techniques
Dynamic units of visual speech

EUROSCA'12 Proceedings of the 11th ACM SIGGRAPH / Eurographics conference on Computer Animation
Dynamic units of visual speech

Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation
A cross-cultural study on the perception of emotions: how hungarian subjects evaluate american and italian emotional expressions

COST'11 Proceedings of the 2011 international conference on Cognitive Behavioural Systems
An anchor patch based optimization framework for reducing optical flow drift in long image sequences

ACCV'12 Proceedings of the 11th Asian conference on Computer Vision - Volume Part III
Comprehensive many-to-many phoneme-to-viseme mapping and its application for concatenative visual speech synthesis

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe how to create with machine learning techniques a generative, speech animation module. A human subject is first recorded using a videocamera as he/she utters a predetermined speech corpus. After processing the corpus automatically, a visual speech module is learned from the data that is capable of synthesizing the human subject's mouth uttering entirely novel utterances that were not recorded in the original video. The synthesized utterance is re-composited onto a background sequence which contains natural head and eye movement. The final output is videorealistic in the sense that it looks like a video camera recording of the subject. At run time, the input to the system can be either real audio sequences or synthetic audio produced by a text-to-speech system, as long as they have been phonetically aligned.The two key contributions of this paper are 1) a variant of the multidimensional morphable model (MMM) to synthesize new, previously unseen mouth configurations from a small set of mouth image prototypes; and 2) a trajectory synthesis technique based on regularization, which is automatically trained from the recorded video corpus, and which is capable of synthesizing trajectories in MMM space corresponding to any desired utterance.