Technical Section: Facial animation based on context-dependent visemes

Authors:
José Mario De Martino;Léo Pini Magalhães;Fábio Violaro
Affiliations:
Department of Computer Engineering and Industrial Automation, School of Electrical and Computer Engineering, State University of Campinas, 13083-970 - Av. Albert Einstein, 400, Campinas, SP, Brazi ...;Department of Computer Engineering and Industrial Automation, School of Electrical and Computer Engineering, State University of Campinas, 13083-970 - Av. Albert Einstein, 400, Campinas, SP, Brazi ...;Department of Communications, School of Electrical and Computer Engineering, State University of Campinas, 13083-970 - Av. Albert Einstein, 400, Campinas, SP, Brazil
Venue:
Computers and Graphics
Year:
2006

Citing 4
Cited 5

Algorithms for clustering data

Algorithms for clustering data
Audio-visual speech synthesis from French text: eight years of models, designs and evaluation at the ICP

Speech Communication - Special issue on auditory-visual speech processing
Artificial Vision for Mobile Robots: Stereo Vision and Multisensory Perception

Artificial Vision for Mobile Robots: Stereo Vision and Multisensory Perception
Computer Facial Animation

Computer Facial Animation

Virtual presenter

Proceedings of the VIII Brazilian Symposium on Human Factors in Computing Systems
Expressive audiovisual SMS reading

Proceedings of the SSPNET 2nd International Symposium on Facial Analysis and Animation
Compact 2D facial animation based on context-dependent visemes

Proceedings of the SSPNET 2nd International Symposium on Facial Analysis and Animation
Towards interactive conversational talking heads

Proceedings of the 3rd Symposium on Facial Analysis and Animation
Comprehensive many-to-many phoneme-to-viseme mapping and its application for concatenative visual speech synthesis

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a novel approach for the generation of realistic speech synchronized 3D facial animation that copes with anticipatory and perseveratory coarticulation. The methodology is based on the measurement of 3D trajectories of fiduciary points marked on the face of a real speaker during the speech production of CVCV non-sense words. The trajectories are measured from standard video sequences using stereo vision photogrammetric techniques. The first stationary point of each trajectory associated with a phonetic segment is selected as its articulatory target. By clustering according to geometric similarity all articulatory targets of a same segment in different phonetic contexts, a set of phonetic context-dependent visemes accounting for coarticulation is identified. These visemes are then used to drive a set of geometric transformation/deformation models that reproduce the rotation and translation of the temporomandibular joint on the 3D virtual face, as well as the behavior of the lips, such as protrusion, and opening width and height of the natural articulation. This approach is being used to generate 3D speech synchronized animation from both natural and synthetic speech generated by a text-to-speech synthesizer.