Active shape models—their training and application
Computer Vision and Image Understanding
Speech Communication - Special issue on auditory-visual speech processing
Proceedings of the 26th annual conference on Computer graphics and interactive techniques
Face transfer with multilinear models
ACM SIGGRAPH 2005 Papers
Visual contribution to speech perception: measuring the intelligibility of animated talking heads
EURASIP Journal on Audio, Speech, and Music Processing
Real-time expression cloning using appearance models
Proceedings of the 9th international conference on Multimodal interfaces
Avatar Puppetry Using Real-Time Audio and Video Analysis
IVA '07 Proceedings of the 7th international conference on Intelligent Virtual Agents
Locating Facial Features with an Extended Active Shape Model
ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part IV
Face/Off: live facial puppetry
Proceedings of the 2009 ACM SIGGRAPH/Eurographics Symposium on Computer Animation
Image and Vision Computing
Reflecting user faces in avatars
IVA'10 Proceedings of the 10th international conference on Intelligent virtual agents
Hi-index | 0.00 |
Several puppetry techniques have been recently proposed to transfer emotional facial expressions to an avatar from a user's video. Whereas generation of facial expressions may not be sensitive to small tracking errors, generation of speech-related facial movements would be severely impaired. Since incongruent facial movements can drastically influence speech perception, we proposed a more effective method to transfer speech-related facial movements from a user to an avatar. After a facial tracking phase, speech articulatory parameters (controlling the jaw and the lips) were determined from the set of landmark positions. Two additional processes calculated the articulatory parameters which controlled the eyelids and the tongue from the 2D Discrete Cosine Transform coefficients of the eyes and inner mouth images. A speech in noise perception experiment was conducted on 25 participants to evaluate the system. Increase in intelligibility was shown for the avatar and human auditory-visual conditions compared to the avatar and human auditory-only conditions, respectively. Depending on the vocalic context, the results of the avatar auditory-visual presentation were different: all the consonants were better perceived in /a/ vocalic context compared to /i/ and /u/ because of the lack of depth information retrieved from video. This method could be used to accurately animate avatars for hearing impaired people using information technologies and telecommunication.