Speech-driven facial animation with realistic dynamics

Authors:
R. Gutierrez-Osuna;P. K. Kakumanu;A. Esposito;O. N. Garcia;A. Bojorquez;J. L. Castillo;I. Rudomin
Affiliations:
Coll. Station, Texas A&M Univ., College Station, TX, USA;-;-;-;-;-;-
Venue:
IEEE Transactions on Multimedia
Year:
2005

Citing 0
Cited 13

Facial animation in a nutshell: past, present and future

SAICSIT '06 Proceedings of the 2006 annual research conference of the South African institute of computer scientists and information technologists on IT research in developing countries
Expressive Face Animation Synthesis Based on Dynamic Mapping Method

ACII '07 Proceedings of the 2nd international conference on Affective Computing and Intelligent Interaction
Multimodal Human Machine Interactions in Virtual and Augmented Reality

Multimodal Signals: Cognitive and Algorithmic Issues
Towards Facial Gestures Generation by Speech Signal Analysis Using HUGE Architecture

Multimodal Signals: Cognitive and Algorithmic Issues
Emphatic visual speech synthesis

IEEE Transactions on Audio, Speech, and Language Processing - Special issue on multimodal processing in speech-based interactions
Realistic visual speech synthesis based on hybrid concatenation method

IEEE Transactions on Audio, Speech, and Language Processing - Special issue on multimodal processing in speech-based interactions
Cultural Specific Effects on the Recognition of Basic Emotions: A Study on Italian Subjects

USAB '09 Proceedings of the 5th Symposium of the Workgroup Human-Computer Interaction and Usability Engineering of the Austrian Computer Society on HCI and Usability for e-Inclusion
Synthesizing a talking mouth

Proceedings of the Seventh Indian Conference on Computer Vision, Graphics and Image Processing
[HUGE]: universal architecture for statistically based HUman GEsturing

IVA'06 Proceedings of the 6th international conference on Intelligent Virtual Agents
Dynamic mapping method based speech driven face animation system

ACII'05 Proceedings of the First international conference on Affective Computing and Intelligent Interaction
Affective computing: a review

ACII'05 Proceedings of the First international conference on Affective Computing and Intelligent Interaction
On speech and gestures synchrony

COST'10 Proceedings of the 2010 international conference on Analysis of Verbal and Nonverbal Communication and Enactment
A cross-cultural study on the perception of emotions: how hungarian subjects evaluate american and italian emotional expressions

COST'11 Proceedings of the 2011 international conference on Cognitive Behavioural Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This work presents an integral system capable of generating animations with realistic dynamics, including the individualized nuances, of three-dimensional (3-D) human faces driven by speech acoustics. The system is capable of capturing short phenomena in the orofacial dynamics of a given speaker by tracking the 3-D location of various MPEG-4 facial points through stereovision. A perceptual transformation of the speech spectral envelope and prosodic cues are combined into an acoustic feature vector to predict 3-D orofacial dynamics by means of a nearest-neighbor algorithm. The Karhunen-Loe´ve transformation is used to identify the principal components of orofacial motion, decoupling perceptually natural components from experimental noise. We also present a highly optimized MPEG-4 compliant player capable of generating audio-synchronized animations at 60 frames/s. The player is based on a pseudo-muscle model augmented with a nonpenetrable ellipsoidal structure to approximate the skull and the jaw. This structure adds a sense of volume that provides more realistic dynamics than existing simplified pseudo-muscle-based approaches, yet it is simple enough to work at the desired frame rate. Experimental results on an audiovisual database of compact TIMIT sentences are presented to illustrate the performance of the complete system.