SynFace: speech-driven facial animation for virtual speech-reading support

Authors:
Giampiero Salvi;Jonas Beskow;Samer Al Moubayed;Björn Granström
Affiliations:
KTH, School of Computer Science and Communication, Deptarment for Speech, Music, and Hearing, Stockholm, Sweden;KTH, School of Computer Science and Communication, Deptarment for Speech, Music, and Hearing, Stockholm, Sweden;KTH, School of Computer Science and Communication, Deptarment for Speech, Music, and Hearing, Stockholm, Sweden;KTH, School of Computer Science and Communication, Deptarment for Speech, Music, and Hearing, Stockholm, Sweden
Venue:
EURASIP Journal on Audio, Speech, and Music Processing - Special issue on animating virtual speakers or singers from audio: Lip-synching facial animation
Year:
2009

Citing 6
Cited 5

Speech-to-Lip Movement Synthesis by Maximizing Audio-Visual Joint Probability Based on the EM Algorithm

Journal of VLSI Signal Processing Systems - Special issue on multimedia signal processing
Trainable videorealistic speech animation

Proceedings of the 29th annual conference on Computer graphics and interactive techniques
Expressive speech-driven facial animation

ACM Transactions on Graphics (TOG)
Design, implementation and evaluation of the Czech realistic audio-visual speech synthesis

Signal Processing - Special section: Multimodal human-computer interfaces
Parameterized Models for Facial Animation

IEEE Computer Graphics and Applications
Analysis and synthesis of multimodal verbal and non-verbal interaction for animated interface agents

COST 2102'07 Proceedings of the 2007 COST action 2102 international conference on Verbal and nonverbal communication behaviours

Perception of gaze direction in 2D and 3D facial projections

Proceedings of the SSPNET 2nd International Symposium on Facial Analysis and Animation
Audio-visual prosody: perception, detection, and synthesis of prominence

Proceedings of the Third COST 2102 international training school conference on Toward autonomous, adaptive, and context-aware multimodal interfaces: theoretical and practical issues
On creating multimodal virtual humans--real time speech driven facial gesturing

Multimedia Tools and Applications
Lip-reading: furhat audio visual intelligibility of a back projected animated face

IVA'12 Proceedings of the 12th international conference on Intelligent Virtual Agents
Animated Lombard speech: Motion capture, facial animation and visual intelligibility of speech produced in adverse conditions

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes SynFace, a supportive technology that aims at enhancing audio-based spoken communication in adverse acoustic conditions by providing the missing visual information in the form of an animated talking head. Firstly, we describe the system architecture, consisting of a 3D animated face model controlled from the speech input by a specifically optimised phonetic recogniser. Secondly, we report on speech intelligibility experiments with focus on multilinguality and robustness to audio quality. The system, already available for Swedish, English, and Flemish, was optimised for German and for Swedish wide-band speech quality available in TV, radio, and Internet communication. Lastly, the paper covers experiments with nonverbal motions driven from the speech signal. It is shown that turn-taking gestures can be used to affect the flow of human-human dialogues. We have focused specifically on two categories of cues that may be extracted from the acoustic signal: prominence/emphasis and interactional cues (turn-taking/back-channelling).