Control of speech-related facial movements of an avatar from video

Authors:
Guillaume Gibert;Yvonne Leung;Catherine J. Stevens
Affiliations:
INSERM U846, 18 avenue Doyen Lépine, 69500 Bron Cedex, France and Stem Cell and Brain Research Institute, 69500 Bron Cedex, France and Université de Lyon, Université Lyon 1, 69003 L ...;Marcs Institute, University of Western Sydney, Locked Bag 1797, Penrith, NSW 2751, Australia;Marcs Institute, University of Western Sydney, Locked Bag 1797, Penrith, NSW 2751, Australia and School of Social Sciences & Psychology, University of Western Sydney, Locked Bag 1797, Penrith, NSW ...
Venue:
Speech Communication
Year:
2013

Citing 11
Cited 0

Active shape models—their training and application

Computer Vision and Image Understanding
Audio-visual speech synthesis from French text: eight years of models, designs and evaluation at the ICP

Speech Communication - Special issue on auditory-visual speech processing
Voice puppetry

Proceedings of the 26th annual conference on Computer graphics and interactive techniques
Face transfer with multilinear models

ACM SIGGRAPH 2005 Papers
Visual contribution to speech perception: measuring the intelligibility of animated talking heads

EURASIP Journal on Audio, Speech, and Music Processing
Real-time expression cloning using appearance models

Proceedings of the 9th international conference on Multimodal interfaces
Avatar Puppetry Using Real-Time Audio and Video Analysis

IVA '07 Proceedings of the 7th international conference on Intelligent Virtual Agents
Locating Facial Features with an Extended Active Shape Model

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part IV
Face/Off: live facial puppetry

Proceedings of the 2009 ACM SIGGRAPH/Eurographics Symposium on Computer Animation
Multi-PIE

Image and Vision Computing
Reflecting user faces in avatars

IVA'10 Proceedings of the 10th international conference on Intelligent virtual agents

Quantified Score

Hi-index	0.00

Visualization

Abstract

Several puppetry techniques have been recently proposed to transfer emotional facial expressions to an avatar from a user's video. Whereas generation of facial expressions may not be sensitive to small tracking errors, generation of speech-related facial movements would be severely impaired. Since incongruent facial movements can drastically influence speech perception, we proposed a more effective method to transfer speech-related facial movements from a user to an avatar. After a facial tracking phase, speech articulatory parameters (controlling the jaw and the lips) were determined from the set of landmark positions. Two additional processes calculated the articulatory parameters which controlled the eyelids and the tongue from the 2D Discrete Cosine Transform coefficients of the eyes and inner mouth images. A speech in noise perception experiment was conducted on 25 participants to evaluate the system. Increase in intelligibility was shown for the avatar and human auditory-visual conditions compared to the avatar and human auditory-only conditions, respectively. Depending on the vocalic context, the results of the avatar auditory-visual presentation were different: all the consonants were better perceived in /a/ vocalic context compared to /i/ and /u/ because of the lack of depth information retrieved from video. This method could be used to accurately animate avatars for hearing impaired people using information technologies and telecommunication.