Real-time speech-driven face animation with expressions using neural networks

Authors:
Pengyu Hong;Zhen Wen;T. S. Huang
Affiliations:
Beckman Inst. for Adv. Sci. & Technol., Illinois Univ., Urbana, IL;-;-
Venue:
IEEE Transactions on Neural Networks
Year:
2002

Citing 0
Cited 11

Emotional facial expression model building

Pattern Recognition Letters
A Video Database of Moving Faces and People

IEEE Transactions on Pattern Analysis and Machine Intelligence
Speaker-independent 3D face synthesis driven by speech and text

Signal Processing - Fractional calculus applications in signals and systems
A coupled HMM approach to video-realistic speech animation

Pattern Recognition
Expressive Face Animation Synthesis Based on Dynamic Mapping Method

ACII '07 Proceedings of the 2nd international conference on Affective Computing and Intelligent Interaction
Interpreting Human and Avatar Facial Expressions

INTERACT '09 Proceedings of the 12th IFIP TC 13 International Conference on Human-Computer Interaction: Part I
Realistic visual speech synthesis based on hybrid concatenation method

IEEE Transactions on Audio, Speech, and Language Processing - Special issue on multimodal processing in speech-based interactions
MMS entertainment system based on mobile phone

ICIMCS '10 Proceedings of the Second International Conference on Internet Multimedia Computing and Service
Dynamic mapping method based speech driven face animation system

ACII'05 Proceedings of the First international conference on Affective Computing and Intelligent Interaction
Affective computing: a review

ACII'05 Proceedings of the First international conference on Affective Computing and Intelligent Interaction
It's not all written on the robot's face

Robotics and Autonomous Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

A real-time speech-driven synthetic talking face provides an effective multimodal communication interface in distributed collaboration environments. Nonverbal gestures such as facial expressions are important to human communication and should be considered by speech-driven face animation systems. In this paper, we present a framework that systematically addresses facial deformation modeling, automatic facial motion analysis, and real-time speech-driven face animation with expression using neural networks. Based on this framework, we learn a quantitative visual representation of the facial deformations, called the motion units (MUs). A facial deformation can be approximated by a linear combination of the MUs weighted by MU parameters (MUPs). We develop an MU-based facial motion tracking algorithm which is used to collect an audio-visual training database. Then, we construct a real-time audio-to-MUP mapping by training a set of neural networks using the collected audio-visual training database. The quantitative evaluation of the mapping shows the effectiveness of the proposed approach. Using the proposed method, we develop the functionality of real-time speech-driven face animation with expressions for the iFACE system. Experimental results show that the synthetic expressive talking face of the iFACE system is comparable with a real face in terms of the effectiveness of their influences on bimodal human emotion perception.