Model-based synthesis of visual speech movements from 3D video

Authors:
James D. Edge;Adrian Hilton;Philip Jackson
Affiliations:
Centre for Vision, Speech and Signal Processing, The University of Surrey, Surrey, UK;Centre for Vision, Speech and Signal Processing, The University of Surrey, Surrey, UK;Centre for Vision, Speech and Signal Processing, The University of Surrey, Surrey, UK
Venue:
SIGGRAPH '09: Posters
Year:
2009

Citing 13
Cited 0

Iterative point matching for registration of free-form curves and surfaces

International Journal of Computer Vision
Video Rewrite: driving visual speech with audio

Proceedings of the 24th annual conference on Computer graphics and interactive techniques
EM algorithms for PCA and SPCA

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Voice puppetry

Proceedings of the 26th annual conference on Computer graphics and interactive techniques
Trainable videorealistic speech animation

Proceedings of the 29th annual conference on Computer graphics and interactive techniques
Active Appearance Models

ECCV '98 Proceedings of the 5th European Conference on Computer Vision-Volume II - Volume II
Sample-Based Synthesis of Photo-Realistic Talking Heads

CA '98 Proceedings of the Computer Animation
Training a Talking Head

ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces
Spacetime faces: high resolution capture for modeling and animation

ACM SIGGRAPH 2004 Papers
Expressive speech-driven facial animation

ACM Transactions on Graphics (TOG)
Expressive Facial Animation Synthesis by Learning Speech Coarticulation and Expression Spaces

IEEE Transactions on Visualization and Computer Graphics
eFASE: expressive facial animation synthesis and editing with phoneme-isomap controls

Proceedings of the 2006 ACM SIGGRAPH/Eurographics symposium on Computer animation
Video-rate capture of dynamic face shape and appearance

FGR' 04 Proceedings of the Sixth IEEE international conference on Automatic face and gesture recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe a method for the synthesis of visual speech movements using a hybrid unit selection/model-based approach. Speech lip movements are captured using a 3D stereo face capture system and split up into phonetic units. A dynamic parameterisation of this data is constructed which maintains the relationship between lip shapes and velocities; within this parameterisation a model of how lips move is built and is used in the animation of visual speech movements from speech audio input. The mapping from audio parameters to lip movements is disambiguated by selecting only the most similar stored phonetic units to the target utterance during synthesis. By combining properties of model-based synthesis (e.g., HMMs, neural nets) with unit selection we improve the quality of our speech synthesis.