Creating Interactive Virtual Humans: Some Assembly Required
IEEE Intelligent Systems
E-Partner: A Photo-Realistic Conversation Agent
PCM '01 Proceedings of the Second IEEE Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing
Photo-realistic conversation agent
Integrated image and graphics technologies
Towards perceptually realistic talking heads: models, methods and McGurk
APGV '04 Proceedings of the 1st Symposium on Applied perception in graphics and visualization
Surface detail capturing for realistic facial animation
Journal of Computer Science and Technology - Special issue on computer graphics and computer-aided design
Video-based character animation
Proceedings of the 2005 ACM SIGGRAPH/Eurographics symposium on Computer animation
Transferable videorealistic speech animation
Proceedings of the 2005 ACM SIGGRAPH/Eurographics symposium on Computer animation
Toward Perceptually Realistic Talking Heads: Models, Methods, and McGurk
ACM Transactions on Applied Perception (TAP)
Data Fusion and Multicue Data Matching by Diffusion Maps
IEEE Transactions on Pattern Analysis and Machine Intelligence
Facial animation in a nutshell: past, present and future
SAICSIT '06 Proceedings of the 2006 annual research conference of the South African institute of computer scientists and information technologists on IT research in developing countries
MATCHKiosk: a multimodal interactive city guide
ACLdemo '04 Proceedings of the ACL 2004 on Interactive poster and demonstration sessions
A coupled HMM approach to video-realistic speech animation
Pattern Recognition
Audio-visual speech processing: progress and challenges
VisHCI '06 Proceedings of the HCSNet workshop on Use of vision in human-computer interaction - Volume 56
Multimodal Unit Selection for 2D Audiovisual Text-to-Speech Synthesis
MLMI '08 Proceedings of the 5th international workshop on Machine Learning for Multimodal Interaction
On the importance of audiovisual coherence for the perceived quality of synthesized visual speech
EURASIP Journal on Audio, Speech, and Music Processing - Special issue on animating virtual speakers or singers from audio: Lip-synching facial animation
Optimization of an image-based talking head system
EURASIP Journal on Audio, Speech, and Music Processing - Special issue on animating virtual speakers or singers from audio: Lip-synching facial animation
Proceedings of the Seventh Indian Conference on Computer Vision, Graphics and Image Processing
Dynamic units of visual speech
EUROSCA'12 Proceedings of the 11th ACM SIGGRAPH / Eurographics conference on Computer Animation
Dynamic units of visual speech
Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation
Hi-index | 0.00 |
This paper describes a system for creating a photo-realistic model of the human head that can be animated and lip-synched from phonetic transcripts of text. Combined with a state-of-the-art text-to-speech synthesizer (TTS), it generates video animations of talking heads that closely resemble real people. To obtain a naturally looking head, we choose a “data-driven” approach. We record a talking person and apply image recognition to extract automatically bitmaps of facial parts. These bitmaps are normalized and parameterized before being entered into a database. For synthesis, the TTS provides the audio track, as well as the phonetic transcript from which trajectories in the space of parameterized bitmaps are computed for all facial parts. Sampling these trajectories and retrieving the corresponding bitmaps from the database produces animated facial parts. These facial parts are then projected and blended onto an image of the whole head using its pose information. This talking head model can produce new never recorded speech of the person who was originally recorded. Talking-head animations of this type are useful as a front-end for agents and avatars in multimedia applications such as virtual operators, virtual announcers, help desks, educational, and expert systems