Accurate Visible Speech Synthesis Based on Concatenating Variable Length Motion Capture Data

Authors:
Jiyong Ma;Ron Cole;Bryan Pellom;Wayne Ward;Barbara Wise
Affiliations:
IEEE;IEEE;IEEE;-;-
Venue:
IEEE Transactions on Visualization and Computer Graphics
Year:
2006

Citing 21
Cited 4

Performance-driven facial animation

SIGGRAPH '90 Proceedings of the 17th annual conference on Computer graphics and interactive techniques
Video Rewrite: driving visual speech with audio

Proceedings of the 24th annual conference on Computer graphics and interactive techniques
Linear Object Classes and Image Synthesis From a Single Example Image

IEEE Transactions on Pattern Analysis and Machine Intelligence
Making faces

Proceedings of the 25th annual conference on Computer graphics and interactive techniques
EM algorithms for PCA and SPCA

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Expression cloning

Proceedings of the 28th annual conference on Computer graphics and interactive techniques
Spoken Language Processing: A Guide to Theory, Algorithm, and System Development

Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
Trainable videorealistic speech animation

Proceedings of the 29th annual conference on Computer graphics and interactive techniques
Computer generated animation of faces

ACM '72 Proceedings of the ACM annual conference - Volume 1
MPEG-4 Facial Animation: The Standard,Implementation and Applications

MPEG-4 Facial Animation: The Standard,Implementation and Applications
Modeling and Animating Realistic Faces from Images

International Journal of Computer Vision
Real-Time Facial Animation based upon a Bank of 3D Facial Expressions

CA '98 Proceedings of the Computer Animation
Issues with Lip Sync Animation: Can You Read My Lips?

CA '02 Proceedings of the Computer Animation
Learning controls for blend shape based realistic facial animation

Proceedings of the 2003 ACM SIGGRAPH/Eurographics symposium on Computer animation
Labial Coarticulation Modeling for Realistic Facial Animation

ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces
Principal Components of Expressive Speech Animation

CGI '01 Proceedings of the International Conference on Computer Graphics
Animating visible speech and facial expressions

The Visual Computer: International Journal of Computer Graphics
Real-time speech motion synthesis from recorded motions

SCA '04 Proceedings of the 2004 ACM SIGGRAPH/Eurographics symposium on Computer animation
Accurate automatic visible speech synthesis of arbitrary 3D models based on concatenation of diviseme motion capture data: Research Articles

Computer Animation and Virtual Worlds
Unit selection in a concatenative speech synthesis system using a large speech database

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Data smoothing by cubic spline filters

IEEE Transactions on Signal Processing

eFASE: expressive facial animation synthesis and editing with phoneme-isomap controls

Proceedings of the 2006 ACM SIGGRAPH/Eurographics symposium on Computer animation
A Reference Verification Framework and its Application to a Children's Speech Reading Tracker

Proceedings of the 2nd Workshop on Child, Computer and Interaction
Dynamic units of visual speech

EUROSCA'12 Proceedings of the 11th ACM SIGGRAPH / Eurographics conference on Computer Animation
Dynamic units of visual speech

Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a novel approach to synthesizing accurate visible speech based on searching and concatenating optimal variable-length units in a large corpus of motion capture data. Based on a set of visual prototypes selected on a source face and a corresponding set designated for a target face, we propose a machine learning technique to automatically map the facial motions observed on the source face to the target face. In order to model the long distance coarticulation effects in visible speech, a large-scale corpus that covers the most common syllables in English was collected, annotated and analyzed. For any input text, a search algorithm to locate the optimal sequences of concatenated units for synthesis is desrcribed. A new algorithm to adapt lip motions from a generic 3D face model to a specific 3D face model is also proposed. A complete, end-to-end visible speech animation system is implemented based on the approach. This system is currently used in more than 60 kindergarten through third grade classrooms to teach students to read using a lifelike conversational animated agent. To evaluate the quality of the visible speech produced by the animation system, both subjective evaluation and objective evaluation are conducted. The evaluation results show that the proposed approach is accurate and powerful for visible speech synthesis.