Realistic facial expression synthesis for an image-based talking head

Authors:
Kang Liu;Joern Ostermann
Affiliations:
Institut für Informationsverarbeitung, Leibniz Universität Hannover, Appelstr. 9A, 30167, Germany;Institut für Informationsverarbeitung, Leibniz Universität Hannover, Appelstr. 9A, 30167, Germany
Venue:
ICME '11 Proceedings of the 2011 IEEE International Conference on Multimedia and Expo
Year:
2011

Citing 0
Cited 1

Comprehensive many-to-many phoneme-to-viseme mapping and its application for concatenative visual speech synthesis

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents an image-based talking head system that is able to synthesize realistic facial expressions accompanying speech, given arbitrary text input and control tags of facial expression. As an example of facial expression primitives, smile is used. First, three types of videos are recorded: a performer speaking without any expressions, smiling while speaking, and smiling after speaking. By analyzing the recorded audiovisual data, an expressive database is built and contains normalized neutral mouth images and smiling mouth images, as well as their associated features and expressive labels. The expressive talking head is synthesized by an unit selection algorithm, which selects and concatenates appropriate mouth image segments from the expressive database. Experimental results show that the smiles of talking heads are as realistic as the real ones objectively, and the viewers cannot distinguish the real smiles from the synthesized ones.