Speech and expression: a computer solution to face animation
Proceedings on Graphics Interface '86/Vision Interface '86
Two-dimensional signal and image processing
Two-dimensional signal and image processing
Feature-based image metamorphosis
SIGGRAPH '92 Proceedings of the 19th annual conference on Computer graphics and interactive techniques
View interpolation for image synthesis
SIGGRAPH '93 Proceedings of the 20th annual conference on Computer graphics and interactive techniques
Performance of optical flow techniques
International Journal of Computer Vision
Realistic modeling for facial animation
SIGGRAPH '95 Proceedings of the 22nd annual conference on Computer graphics and interactive techniques
Image metamorphosis using snakes and free-form deformations
SIGGRAPH '95 Proceedings of the 22nd annual conference on Computer graphics and interactive techniques
SIGGRAPH '96 Proceedings of the 23rd annual conference on Computer graphics and interactive techniques
Video Rewrite: driving visual speech with audio
Proceedings of the 24th annual conference on Computer graphics and interactive techniques
Image-based view synthesis by combining trilinear tensors and learning techniques
VRST '97 Proceedings of the ACM symposium on Virtual reality software and technology
Proceedings of the 25th annual conference on Computer graphics and interactive techniques
Synthesizing realistic facial expressions from photographs
Proceedings of the 25th annual conference on Computer graphics and interactive techniques
Digital Image Warping
ECCV '98 Proceedings of the 5th European Conference on Computer Vision-Volume II - Volume II
Sample-Based Synthesis of Photo-Realistic Talking Heads
CA '98 Proceedings of the Computer Animation
Example Based Image Analysis and Synthesis
Example Based Image Analysis and Synthesis
A parametric model for human faces.
A parametric model for human faces.
Multidimensional Morphable Models
ICCV '98 Proceedings of the Sixth International Conference on Computer Vision
Model-based face and lip animation for interactive virtual reality applications
MULTIMEDIA '01 Proceedings of the ninth ACM international conference on Multimedia
Trainable videorealistic speech animation
Proceedings of the 29th annual conference on Computer graphics and interactive techniques
Face transfer with multilinear models
ACM SIGGRAPH 2005 Papers
Face transfer with multilinear models
ACM SIGGRAPH 2006 Courses
Simulating speech with a physics-based facial muscle model
Proceedings of the 2006 ACM SIGGRAPH/Eurographics symposium on Computer animation
Lips animation based on Japanese phoneme context for an automatic reading system with emotion
Proceedings of the 13th international conference on Intelligent user interfaces
Persian Viseme Classification for Developing Visual Speech Training Application
PCM '09 Proceedings of the 10th Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing
A comprehensive audio-visual corpus for teaching sound persian phoneme articulation
SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
Trainable videorealistic speech animation
FGR' 04 Proceedings of the Sixth IEEE international conference on Automatic face and gesture recognition
The persian linguistic based audio-visual data corpus, AVA II, considering coarticulation
MMM'10 Proceedings of the 16th international conference on Advances in Multimedia Modeling
Lip-reading: furhat audio visual intelligibility of a back projected animated face
IVA'12 Proceedings of the 12th international conference on Intelligent Virtual Agents
Clustering Persian viseme using phoneme subspace for developing visual speech application
Multimedia Tools and Applications
Hi-index | 0.00 |
We present MikeTalk, a text-to-audiovisual speech synthesizer which converts input text into an audiovisual speech stream. MikeTalk is built using visemes, which are a small set of images spanning a large range of mouth shapes. The visemes are acquired from a recorded visual corpus of a human subject which is specifically designed to elicit one instantiation of each viseme. Using optical flow methods, correspondence from every viseme to every other viseme is computed automatically. By morphing along this correspondence, a smooth transition between viseme images may be generated. A complete visual utterance is constructed by concatenating viseme transitions. Finally, phoneme and timing information extracted from a text-to-speech synthesizer is exploited to determine which viseme transitions to use, and the rate at which the morphing process should occur. In this manner, we are able to synchronize the visual speech stream with the audio speech stream, and hence give the impression of a photorealistic talking face.