Optimization of an image-based talking head system

Authors:
Kang Liu;Joern Ostermann
Affiliations:
Institut für Informationsverarbeitung, Leibniz Universität Hannover, Hannover, Germany;Institut für Informationsverarbeitung, Leibniz Universität Hannover, Hannover, Germany
Venue:
EURASIP Journal on Audio, Speech, and Music Processing - Special issue on animating virtual speakers or singers from audio: Lip-synching facial animation
Year:
2009

Citing 14
Cited 0

Video Rewrite: driving visual speech with audio

Proceedings of the 24th annual conference on Computer graphics and interactive techniques
Synthesizing realistic facial expressions from photographs

Proceedings of the 25th annual conference on Computer graphics and interactive techniques
A morphable model for the synthesis of 3D faces

Proceedings of the 26th annual conference on Computer graphics and interactive techniques
Active Appearance Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Head shop: generating animated head models with anatomical structure

Proceedings of the 2002 ACM SIGGRAPH/Eurographics symposium on Computer animation
Trainable videorealistic speech animation

Proceedings of the 29th annual conference on Computer graphics and interactive techniques
Hidden Markov Model Inversion for Audio-to-Visual Conversion in an MPEG-4 Facial Animation System

Journal of VLSI Signal Processing Systems
MikeTalk: A Talking Facial Display Based on Morphing Visemes

CA '98 Proceedings of the Computer Animation
Animation of Synthetic Faces in MPEG-4

CA '98 Proceedings of the Computer Animation
Talking Faces - Technologies and Applications

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
Unit selection in a concatenative speech synthesis system using a large speech database

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Data-Driven 3D Facial Animation

Data-Driven 3D Facial Animation
Photo-realistic talking-heads from image samples

IEEE Transactions on Multimedia
Realistic Mouth-Synching for Speech-Driven Talking Face Using Articulatory Modelling

IEEE Transactions on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents an image-based talking head system, which includes two parts: analysis and synthesis. The audiovisual analysis part creates a face model of a recorded human subject, which is composed of a personalized 3D mask as well as a large database of mouth images and their related information. The synthesis part generates natural looking facial animations from phonetic transcripts of text. A critical issue of the synthesis is the unit selection which selects and concatenates these appropriate mouth images from the database such that they match the spoken words of the talking head. Selection is based on lip synchronization and the similarity of consecutive images. The unit selection is refined in this paper, and Pareto optimization is used to train the unit selection. Experimental results of subjective tests show that most people cannot distinguish our facial animations from real videos.