Lip-synching using speaker-specific articulation, shape and appearance models

Authors:
Gérard Bailly;Oxana Govokhina;Frédéric Elisei;Gaspard Breton
Affiliations:
Department of Speech and Cognition, GIPSA-Lab, CNRS & Grenoble University, Saint Martin d'Hères cedex, France;Department of Speech and Cognition, GIPSA-Lab, CNRS & Grenoble University, Saint Martin d'Hères cedex, France and Orange Labs, Cesson-Sévigné, France;Department of Speech and Cognition, GIPSA-Lab, CNRS & Grenoble University, Saint Martin d'Hères cedex, France;Orange Labs, Cesson-Sévigné, France
Venue:
EURASIP Journal on Audio, Speech, and Music Processing - Special issue on animating virtual speakers or singers from audio: Lip-synching facial animation
Year:
2009

Citing 9
Cited 0

Video Rewrite: driving visual speech with audio

Proceedings of the 24th annual conference on Computer graphics and interactive techniques
Synthesizing realistic facial expressions from photographs

Proceedings of the 25th annual conference on Computer graphics and interactive techniques
Active Appearance Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Trainable videorealistic speech animation

Proceedings of the 29th annual conference on Computer graphics and interactive techniques
Analysis and Synthesis of Facial Image Sequences Using Physical and Anatomical Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
A 3D Finite Element Model of the Face for Simulation in Plastic and Maxillo-Facial Surgery

MICCAI '00 Proceedings of the Third International Conference on Medical Image Computing and Computer-Assisted Intervention
Sample-Based Synthesis of Photo-Realistic Talking Heads

CA '98 Proceedings of the Computer Animation
MikeTalk: A Talking Facial Display Based on Morphing Visemes

CA '98 Proceedings of the Computer Animation
Visual model structures and synchrony constraints for audio-visual speech recognition

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe here the control, shape and appearance models that are built using an original photogrammetric method to capture characteristics of speaker-specific facial articulation, anatomy, and texture. Two original contributions are put forward here: the trainable trajectory formation model that predicts articulatory trajectories of a talking face from phonetic input and the texture model that computes a texture for each 3D facial shape according to articulation. Using motion capture data from different speakers and module-specific evaluation procedures, we show here that this cloning system restores detailed idiosyncrasies and the global coherence of visible articulation. Results of a subjective evaluation of the global system with competing trajectory formation models are further presented and commented.