Transferring of Speech Movements from Video to 3D Face Space

Authors:
Yuru Pei;Hongbin Zha
Affiliations:
-;-
Venue:
IEEE Transactions on Visualization and Computer Graphics
Year:
2007

Citing 29
Cited 3

A muscle model for animation three-dimensional facial expression

SIGGRAPH '87 Proceedings of the 14th annual conference on Computer graphics and interactive techniques
Realistic modeling for facial animation

SIGGRAPH '95 Proceedings of the 22nd annual conference on Computer graphics and interactive techniques
Simulating facial surgery using finite element models

SIGGRAPH '96 Proceedings of the 23rd annual conference on Computer graphics and interactive techniques
Video Rewrite: driving visual speech with audio

Proceedings of the 24th annual conference on Computer graphics and interactive techniques
Synthesizing realistic facial expressions from photographs

Proceedings of the 25th annual conference on Computer graphics and interactive techniques
Subdivision surfaces in character animation

Proceedings of the 25th annual conference on Computer graphics and interactive techniques
Voice puppetry

Proceedings of the 26th annual conference on Computer graphics and interactive techniques
A morphable model for the synthesis of 3D faces

Proceedings of the 26th annual conference on Computer graphics and interactive techniques
Shape by example

I3D '01 Proceedings of the 2001 symposium on Interactive 3D graphics
Expression cloning

Proceedings of the 28th annual conference on Computer graphics and interactive techniques
Animated deformations with radial basis functions

VRST '00 Proceedings of the ACM symposium on Virtual reality software and technology
Head shop: generating animated head models with anatomical structure

Proceedings of the 2002 ACM SIGGRAPH/Eurographics symposium on Computer animation
Trainable videorealistic speech animation

Proceedings of the 29th annual conference on Computer graphics and interactive techniques
Active Appearance Models

ECCV '98 Proceedings of the 5th European Conference on Computer Vision-Volume II - Volume II
Facial Deformations for MPEG-4

CA '98 Proceedings of the Computer Animation
Facial Expression Space Learning

PG '02 Proceedings of the 10th Pacific Conference on Computer Graphics and Applications
"May I talk to you?: -)" " Facial Animation from Text

PG '02 Proceedings of the 10th Pacific Conference on Computer Graphics and Applications
An example-based approach for facial expression cloning

Proceedings of the 2003 ACM SIGGRAPH/Eurographics symposium on Computer animation
Learning controls for blend shape based realistic facial animation

Proceedings of the 2003 ACM SIGGRAPH/Eurographics symposium on Computer animation
Vision-based control of 3D facial animation

Proceedings of the 2003 ACM SIGGRAPH/Eurographics symposium on Computer animation
Unsupervised learning for speech motion editing

Proceedings of the 2003 ACM SIGGRAPH/Eurographics symposium on Computer animation
Deformation transfer for triangle meshes

ACM SIGGRAPH 2004 Papers
Cartoon textures

SCA '04 Proceedings of the 2004 ACM SIGGRAPH/Eurographics symposium on Computer animation
Real-time speech motion synthesis from recorded motions

SCA '04 Proceedings of the 2004 ACM SIGGRAPH/Eurographics symposium on Computer animation
Manifold Based Analysis of Facial Expression

CVPRW '04 Proceedings of the 2004 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'04) Volume 5 - Volume 05
Creating Speech-Synchronized Animation

IEEE Transactions on Visualization and Computer Graphics
Face transfer with multilinear models

ACM SIGGRAPH 2005 Papers
Transferable videorealistic speech animation

Proceedings of the 2005 ACM SIGGRAPH/Eurographics symposium on Computer animation
Expressive speech-driven facial animation

ACM Transactions on Graphics (TOG)

Developing argumentation processing agents for computer-supported collaborative learning

Expert Systems with Applications: An International Journal
Emphatic visual speech synthesis

IEEE Transactions on Audio, Speech, and Language Processing - Special issue on multimodal processing in speech-based interactions
Evaluation of synthetic and natural Mandarin visual speech: Initial consonants, single vowels, and syllables

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a novel method for transferring speech animation recorded in low quality videos to high resolution 3D face models. The basic idea is to synthesize the animated faces by an interpolation based on a small set of 3D key face shapes which span a 3D face space. The 3D key shapes are extracted by an unsupervised learning process in 2D video space to form a set of 2D visemes which are then mapped to the 3D face space. The learning process consists of two main phases: 1) Isomap-based nonlinear dimensionality reduction to embed the video speech movements into a low-dimensional manifold and 2) K-means clustering in the low-dimensional space to extract 2D key viseme frames. Our main contribution is that we use the Isomap-based learning method to extract intrinsic geometry of the speech video space and thus to make it possible to define the 3D key viseme shapes. To do so, we need only to capture a limited number of 3D key face models by using a general 3D scanner. Moreover, we also develop a skull movement recovery method based on simple anatomical structures to enhance 3D realism in local mouth movements. Experimental results show that our method can achieve realistic 3D animation effects with a small number of 3D key face models.