Transferable videorealistic speech animation

Authors:
Yao-Jen Chang;Tony Ezzat
Affiliations:
Computer and Communications Laboratories, ITRI, Taiwan;Center for Biological and Computational Learning, MIT
Venue:
Proceedings of the 2005 ACM SIGGRAPH/Eurographics symposium on Computer animation
Year:
2005

Citing 15
Cited 11

Video Rewrite: driving visual speech with audio

Proceedings of the 24th annual conference on Computer graphics and interactive techniques
Retargetting motion to new characters

Proceedings of the 25th annual conference on Computer graphics and interactive techniques
Synthesizing realistic facial expressions from photographs

Proceedings of the 25th annual conference on Computer graphics and interactive techniques
Multidimensional Morphable Models: A Framework for Representing and Matching Object Classes

International Journal of Computer Vision
Video textures

Proceedings of the 27th annual conference on Computer graphics and interactive techniques
Expressive expression mapping with ratio images

Proceedings of the 28th annual conference on Computer graphics and interactive techniques
Expression cloning

Proceedings of the 28th annual conference on Computer graphics and interactive techniques
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
Spoken Language Processing: A Guide to Theory, Algorithm, and System Development

Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
Trainable videorealistic speech animation

Proceedings of the 29th annual conference on Computer graphics and interactive techniques
Face recognition from one example view

ICCV '95 Proceedings of the Fifth International Conference on Computer Vision
Geometry-driven photorealistic facial expression synthesis

Proceedings of the 2003 ACM SIGGRAPH/Eurographics symposium on Computer animation
Visual Prosody: Facial Movements Accompanying Speech

FGR '02 Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition
Real-time speech motion synthesis from recorded motions

SCA '04 Proceedings of the 2004 ACM SIGGRAPH/Eurographics symposium on Computer animation
Photo-realistic talking-heads from image samples

IEEE Transactions on Multimedia

Transferring of Speech Movements from Video to 3D Face Space

IEEE Transactions on Visualization and Computer Graphics
Simulating speech with a physics-based facial muscle model

Proceedings of the 2006 ACM SIGGRAPH/Eurographics symposium on Computer animation
Dynamic, expressive speech animation from a single mesh

SCA '07 Proceedings of the 2007 ACM SIGGRAPH/Eurographics symposium on Computer animation
Intuitive quasi-eigen faces

Proceedings of the 5th international conference on Computer graphics and interactive techniques in Australia and Southeast Asia
Real-time expression cloning using appearance models

Proceedings of the 9th international conference on Multimodal interfaces
Emphatic visual speech synthesis

IEEE Transactions on Audio, Speech, and Language Processing - Special issue on multimodal processing in speech-based interactions
Lip-synced character speech animation with dominated animeme models

SIGGRAPH Asia 2012 Technical Briefs
Dynamic units of visual speech

EUROSCA'12 Proceedings of the 11th ACM SIGGRAPH / Eurographics conference on Computer Animation
Dynamic units of visual speech

Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation
Photorealistic inner mouth expression in speech animation

ACM SIGGRAPH 2013 Posters
Intelligent virtual humans with autonomy and personality: State-of-the-art

Intelligent Decision Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

Image-based videorealistic speech animation achieves significant visual realism at the cost of the collection of a large 5- to 10-minute video corpus from the specific person to be animated. This requirement hinders its use in broad applications, since a large video corpus for a specific person under a controlled recording setup may not be easily obtained In this paper, we propose a model transfer and adaptation algorithm which allows for a novel person to be animated using only a small video corpus. The algorithm starts with a multidimensional morphable model (MMM) previously trained from a different speaker with a large corpus, and transfers it to the novel speaker with a much smaller corpus. The algorithm consists of 1) a novel matching-by-synthesis algorithm which semi-automatically selects new MMM prototype images from the new video corpus and 2) a novel gradient descent linear regression algorithm which adapts the MMM phoneme models to the data in the novel video corpus. Encouraging experimental results are presented in which a morphable model trained from a performer with a 10-minute corpus is transferred to a novel person using a 15-second movie clip of him as the adaptation video corpus.