On creating multimodal virtual humans--real time speech driven facial gesturing

Authors:
Goranka Zoric;Rober Forchheimer;Igor S. Pandzic
Affiliations:
Department of Telecommunications, Faculty of Electrical Engineering and Computing, University of Zagreb, Zagreb, Croatia HR-10 000;Department of Electrical Engineering, Linköping University, Linköping, Sweden 581 83;Department of Telecommunications, Faculty of Electrical Engineering and Computing, University of Zagreb, Zagreb, Croatia HR-10 000
Venue:
Multimedia Tools and Applications
Year:
2011

Citing 7
Cited 0

MPEG-4 Facial Animation: The Standard,Implementation and Applications

MPEG-4 Facial Animation: The Standard,Implementation and Applications
Visual Prosody: Facial Movements Accompanying Speech

FGR '02 Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition
Audio-based head motion synthesis for Avatar-based telepresence systems

Proceedings of the 2004 ACM SIGMM workshop on Effective telepresence
Mood swings: expressive speech animation

ACM Transactions on Graphics (TOG)
Towards Facial Gestures Generation by Speech Signal Analysis Using HUGE Architecture

Multimodal Signals: Cognitive and Algorithmic Issues
Real-time prosody-driven synthesis of body language

ACM SIGGRAPH Asia 2009 papers
SynFace: speech-driven facial animation for virtual speech-reading support

EURASIP Journal on Audio, Speech, and Music Processing - Special issue on animating virtual speakers or singers from audio: Lip-synching facial animation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Because of extensive use of different computer devices, human-computer interaction design nowadays moves towards creating user centric interfaces. It assumes incorporating different modalities that humans use in everyday communication. Virtual humans, who look and behave believably, fit perfectly in the concept of designing interfaces in more natural, effective, as well as social oriented way. In this paper we present a novel method for automatic speech driven facial gesturing for virtual humans capable of real time performance. Facial gestures included are various nods and head movements, blinks, eyebrow gestures and gaze. A mapping from speech to facial gestures is based on the prosodic information obtained from the speech signal. It is realized using a hybrid approach--Hidden Markov Models, rules and global statistics. Further, we test the method using an application prototype--a system for speech driven facial gesturing suitable for virtual presenters. Subjective evaluation of the system confirmed that the synthesized facial movements are consistent and time aligned with the underlying speech, and thus provide natural behavior of the whole face.