Speech-Driven Facial Animation Using a Shared Gaussian Process Latent Variable Model

Authors:
Salil Deena;Aphrodite Galata
Affiliations:
School of Computer Science, University of Manchester, Manchester, United Kingdom M13 9PL;School of Computer Science, University of Manchester, Manchester, United Kingdom M13 9PL
Venue:
ISVC '09 Proceedings of the 5th International Symposium on Advances in Visual Computing: Part I
Year:
2009

Citing 16
Cited 2

Video Rewrite: driving visual speech with audio

Proceedings of the 24th annual conference on Computer graphics and interactive techniques
Voice puppetry

Proceedings of the 26th annual conference on Computer graphics and interactive techniques
Head shop: generating animated head models with anatomical structure

Proceedings of the 2002 ACM SIGGRAPH/Eurographics symposium on Computer animation
Spoken Language Processing: A Guide to Theory, Algorithm, and System Development

Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
Trainable videorealistic speech animation

Proceedings of the 29th annual conference on Computer graphics and interactive techniques
Analysis and Synthesis of Facial Image Sequences Using Physical and Anatomical Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Active Appearance Models

ECCV '98 Proceedings of the 5th European Conference on Computer Vision-Volume II - Volume II
MikeTalk: A Talking Facial Display Based on Morphing Visemes

CA '98 Proceedings of the Computer Animation
Vision-based control of 3D facial animation

Proceedings of the 2003 ACM SIGGRAPH/Eurographics symposium on Computer animation
A parametric model for human faces.

A parametric model for human faces.
Speech Driven Facial Animation using a Hidden Markov Coarticulation Model

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 1 - Volume 01
Real-time speech motion synthesis from recorded motions

SCA '04 Proceedings of the 2004 ACM SIGGRAPH/Eurographics symposium on Computer animation
Local distance preservation in the GP-LVM through back constraints

ICML '06 Proceedings of the 23rd international conference on Machine learning
Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models

The Journal of Machine Learning Research
Ambiguity Modeling in Latent Spaces

MLMI '08 Proceedings of the 5th international workshop on Machine Learning for Multimodal Interaction
Gaussian process latent variable models for human pose estimation

MLMI'07 Proceedings of the 4th international conference on Machine learning for multimodal interaction

Visual speech synthesis by modelling coarticulation dynamics using a non-parametric switching state-space model

International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction
Facial movement based recognition

MIRAGE'11 Proceedings of the 5th international conference on Computer vision/computer graphics collaboration techniques

Quantified Score

Hi-index	0.01

Visualization

Abstract

In this work, synthesis of facial animation is done by modelling the mapping between facial motion and speech using the shared Gaussian process latent variable model. Both data are processed separately and subsequently coupled together to yield a shared latent space. This method allows coarticulation to be modelled by having a dynamical model on the latent space. Synthesis of novel animation is done by first obtaining intermediate latent points from the audio data and then using a Gaussian Process mapping to predict the corresponding visual data. Statistical evaluation of generated visual features against ground truth data compares favourably with known methods of speech animation. The generated videos are found to show proper synchronisation with audio and exhibit correct facial dynamics.