Real-time 3D talking head from a synthetic viseme dataset

Authors:
Arthur Niswar;Ee Ping Ong;Hong Thai Nguyen;Zhiyong Huang
Affiliations:
Institute for Infocomm Research, Singapore;Institute for Infocomm Research, Singapore;Institute for Infocomm Research, Singapore;Institute for Infocomm Research, Singapore
Venue:
Proceedings of the 8th International Conference on Virtual Reality Continuum and its Applications in Industry
Year:
2009

Citing 7
Cited 0

Realistic modeling for facial animation

SIGGRAPH '95 Proceedings of the 22nd annual conference on Computer graphics and interactive techniques
Computer facial animation

Computer facial animation
Audio-visual speech synthesis from French text: eight years of models, designs and evaluation at the ICP

Speech Communication - Special issue on auditory-visual speech processing
Numerical Recipes in C: The Art of Scientific Computing

Numerical Recipes in C: The Art of Scientific Computing
Principal Components of Expressive Speech Animation

CGI '01 Computer Graphics International 2001
Robust and Rapid Generation of Animated Faces from Video Images: A Model-Based Modeling Approach

International Journal of Computer Vision - Special Issue on Research at Microsoft Corporation
Expressive Facial Animation Synthesis by Learning Speech Coarticulation and Expression Spaces

IEEE Transactions on Visualization and Computer Graphics

Quantified Score

Hi-index	0.01

Visualization

Abstract

In this paper, we describe a simple and fast way to build a 3D talking head which can be used in many applications requiring audiovisual speech animation system. The talking head is constructed from a synthetic 3D viseme dataset, which is realistic enough and can be generated with 3D modeling software. To build the talking head, at first the viseme dataset is analyzed statistically to obtain the optimal linear parameters to control the movements of the lips and jaw of the 3D head model. These parameters correspond to some of the low-level MPEG-4 FAPs, hence our method can be used to extract the speech-relevant MPEG-4 FAPs from a dataset of phonemes/visemes. The parameterized head model is eventually combined with a Text-to-Speech (TTS) system to synthesize audiovisual speech from a given text. To make the talking head looks more realistic, eye-blink and movements are also animated during the speech. We implemented this work in an interactive text-to-audio-visual speech system.