Video Rewrite: driving visual speech with audio
Proceedings of the 24th annual conference on Computer graphics and interactive techniques
Codebook based face point trajectory synthesis algorithm using speech input
Speech Communication
Visual Speech Synthesis by Morphing Visemes
International Journal of Computer Vision - special issue on learning and vision at the center for biological and computational learning, Massachusetts Institute of Technology
IEEE Transactions on Pattern Analysis and Machine Intelligence
Trainable videorealistic speech animation
Proceedings of the 29th annual conference on Computer graphics and interactive techniques
Classifying Visemes for Automatic Lipreading
TSD '99 Proceedings of the Second International Workshop on Text, Speech and Dialogue
Speech-Driven Face Synthesis from 3D Video
3DPVT '04 Proceedings of the 3D Data Processing, Visualization, and Transmission, 2nd International Symposium
Proceedings of the 6th international conference on Multimodal interfaces
Technical Section: Facial animation based on context-dependent visemes
Computers and Graphics
Design, implementation and evaluation of the Czech realistic audio-visual speech synthesis
Signal Processing - Special section: Multimodal human-computer interfaces
Unit selection in a concatenative speech synthesis system using a large speech database
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Multimodal Unit Selection for 2D Audiovisual Text-to-Speech Synthesis
MLMI '08 Proceedings of the 5th international workshop on Machine Learning for Multimodal Interaction
On the importance of audiovisual coherence for the perceived quality of synthesized visual speech
EURASIP Journal on Audio, Speech, and Music Processing - Special issue on animating virtual speakers or singers from audio: Lip-synching facial animation
Emphatic visual speech synthesis
IEEE Transactions on Audio, Speech, and Language Processing - Special issue on multimodal processing in speech-based interactions
Compact 2D facial animation based on context-dependent visemes
Proceedings of the SSPNET 2nd International Symposium on Facial Analysis and Animation
Realistic facial expression synthesis for an image-based talking head
ICME '11 Proceedings of the 2011 IEEE International Conference on Multimedia and Expo
Least squares quantization in PCM
IEEE Transactions on Information Theory
Dynamic units of visual speech
EUROSCA'12 Proceedings of the 11th ACM SIGGRAPH / Eurographics conference on Computer Animation
Hi-index | 0.00 |
The use of visemes as atomic speech units in visual speech analysis and synthesis systems is well-established. Viseme labels are determined using a many-to-one phoneme-to-viseme mapping. However, due to visual coarticulation effects, an accurate mapping from phonemes to visemes should define a many-to-many mapping scheme instead. In this research it was found that neither the use of standardized nor speaker-dependent many-to-one viseme labels could satisfy the quality requirements of concatenative visual speech synthesis. Therefore, a novel technique to define a many-to-many phoneme-to-viseme mapping scheme is introduced, which makes use of both tree-based and k-means clustering approaches. We show that these many-to-many viseme labels more accurately describe the visual speech information as compared to both phoneme-based and many-to-one viseme-based speech labels. In addition, we found that the use of these many-to-many visemes improves the precision of the segment selection phase in concatenative visual speech synthesis using limited speech databases. Furthermore, the resulting synthetic visual speech was both objectively and subjectively found to be of higher quality when the many-to-many visemes are used to describe the speech database and the synthesis targets.