Visual speech synthesis by modelling coarticulation dynamics using a non-parametric switching state-space model

Authors:
Salil Deena;Shaobo Hou;Aphrodite Galata
Affiliations:
University of Manchester, UK;University of Manchester, UK;University of Manchester, UK
Venue:
International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction
Year:
2010

Citing 18
Cited 0

The power of amnesia: learning probabilistic automata with variable memory length

Machine Learning - Special issue on COLT '94
Video Rewrite: driving visual speech with audio

Proceedings of the 24th annual conference on Computer graphics and interactive techniques
Voice puppetry

Proceedings of the 26th annual conference on Computer graphics and interactive techniques
Learning variable-length Markov models of behavior

Computer Vision and Image Understanding - Modeling people toward vision-based underatanding of a person's shape, appearance, and movement
Active Appearance Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Spoken Language Processing: A Guide to Theory, Algorithm, and System Development

Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
Trainable videorealistic speech animation

Proceedings of the 29th annual conference on Computer graphics and interactive techniques
Design of a linguistic postprocessor using variable memory length Markov models

ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 1) - Volume 1
Multidimensional Morphable Models

ICCV '98 Proceedings of the Sixth International Conference on Computer Vision
Real-time speech motion synthesis from recorded motions

SCA '04 Proceedings of the 2004 ACM SIGGRAPH/Eurographics symposium on Computer animation
3D People Tracking with Gaussian Process Dynamical Models

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 1
Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models

The Journal of Machine Learning Research
A coupled HMM approach to video-realistic speech animation

Pattern Recognition
Gaussian Process Dynamical Models for Human Motion

IEEE Transactions on Pattern Analysis and Machine Intelligence
Ambiguity Modeling in Latent Spaces

MLMI '08 Proceedings of the 5th international workshop on Machine Learning for Multimodal Interaction
Speech-Driven Facial Animation Using a Shared Gaussian Process Latent Variable Model

ISVC '09 Proceedings of the 5th International Symposium on Advances in Visual Computing: Part I
Gaussian process latent variable models for human pose estimation

MLMI'07 Proceedings of the 4th international conference on Machine learning for multimodal interaction
Mapping from speech to images using continuous state space models

MLMI'04 Proceedings of the First international conference on Machine Learning for Multimodal Interaction

Quantified Score

Hi-index	0.01

Visualization

Abstract

We present a novel approach to speech-driven facial animation using a non-parametric switching state space model based on Gaussian processes. The model is an extension of the shared Gaussian process dynamical model, augmented with switching states. Audio and visual data from a talking head corpus are jointly modelled using the proposed method. The switching states are found using variable length Markov models trained on labelled phonetic data. We also propose a synthesis technique that takes into account both previous and future phonetic context, thus accounting for coarticulatory effects in speech.