Trainable videorealistic speech animation
Proceedings of the 29th annual conference on Computer graphics and interactive techniques
Hi-index | 0.00 |
Speech animation is traditionally considered as important but tedious work for most applications, because the muscles on the face are complex and dynamically interacting. In this paper, we introduce a framework for synthesizing a 3D lip-sync speech animation by a given speech sequence and its corresponding texts. We first identify the representative key-lip-shapes from a training video that are important for blend-shapes and guiding the artist to create corresponding 3D key-faces (lips). The training faces in the video are then cross-mapped to the crafted key-faces to construct the Dominated Animeme Models (DAM) for each kind of phoneme. Considering the coarticulation effects in animation control signals from the cross-mapped training faces, the DAM computes two functions: polynomial-fitted animeme shape functions and corresponding dominance weighting functions. Finally, given a novel speech sequence and its corresponding texts, a lip-sync speech animation can be synthesized in a short time with the DAM.