Dynamic units of visual speech

Authors:
Sarah L. Taylor;Moshe Mahler;Barry-John Theobald;Iain Matthews
Affiliations:
University of East Anglia, Norwich, England;Disney Research, Pittsburgh;University of East Anglia, Norwich, England;Disney Research, Pittsburgh
Venue:
EUROSCA'12 Proceedings of the 11th ACM SIGGRAPH / Eurographics conference on Computer Animation
Year:
2012

Citing 21
Cited 2

Performance-driven facial animation

SIGGRAPH '90 Proceedings of the 17th annual conference on Computer graphics and interactive techniques
Computer facial animation

Computer facial animation
Video Rewrite: driving visual speech with audio

Proceedings of the 24th annual conference on Computer graphics and interactive techniques
Voice puppetry

Proceedings of the 26th annual conference on Computer graphics and interactive techniques
Data clustering: a review

ACM Computing Surveys (CSUR)
Active Appearance Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Expression cloning

Proceedings of the 28th annual conference on Computer graphics and interactive techniques
Trainable videorealistic speech animation

Proceedings of the 29th annual conference on Computer graphics and interactive techniques
MikeTalk: A Talking Facial Display Based on Morphing Visemes

CA '98 Proceedings of the Computer Animation
Vision-based control of 3D facial animation

Proceedings of the 2003 ACM SIGGRAPH/Eurographics symposium on Computer animation
Active Appearance Models Revisited

International Journal of Computer Vision
A segment-based audio-visual speech recognizer: data collection, development, and initial experiments

Proceedings of the 6th international conference on Multimodal interfaces
Real-time speech motion synthesis from recorded motions

SCA '04 Proceedings of the 2004 ACM SIGGRAPH/Eurographics symposium on Computer animation
Exact indexing of dynamic time warping

Knowledge and Information Systems
Mood swings: expressive speech animation

ACM Transactions on Graphics (TOG)
Face transfer with multilinear models

ACM SIGGRAPH 2005 Papers
Transferable videorealistic speech animation

Proceedings of the 2005 ACM SIGGRAPH/Eurographics symposium on Computer animation
Expressive speech-driven facial animation

ACM Transactions on Graphics (TOG)
Accurate Visible Speech Synthesis Based on Concatenating Variable Length Motion Capture Data

IEEE Transactions on Visualization and Computer Graphics
Dynamic, expressive speech animation from a single mesh

SCA '07 Proceedings of the 2007 ACM SIGGRAPH/Eurographics symposium on Computer animation
Photo-realistic talking-heads from image samples

IEEE Transactions on Multimedia

Comprehensive many-to-many phoneme-to-viseme mapping and its application for concatenative visual speech synthesis

Speech Communication
Efficient speech animation synthesis with vocalic lip shapes

ACM SIGGRAPH 2013 Posters

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a new method for generating a dynamic, concatenative, unit of visual speech that can generate realistic visual speech animation. We redefine visemes as temporal units that describe distinctive speech movements of the visual speech articulators. Traditionally visemes have been surmized as the set of static mouth shapes representing clusters of contrastive phonemes (e.g. /p, b, m/, and /f, v/). In this work, the motion of the visual speech articulators are used to generate discrete, dynamic visual speech gestures. These gestures are clustered, providing a finite set of movements that describe visual speech, the visemes. Dynamic visemes are applied to speech animation by simply concatenating viseme units. We compare to static visemes using subjective evaluation. We find that dynamic visemes are able to produce more accurate and visually pleasing speech animation given phonetically annotated audio, reducing the amount of time that an animator needs to spend manually refining the animation.