Real-time language independent lip synchronization method using a genetic algorithm

Authors:
Goranka Zorić;Igor S. Pandžić
Affiliations:
Department of Telecommunications, Faculty of Electrical Engineering and Computing, University of Zagreb, Zagreb, Croatia;Department of Telecommunications, Faculty of Electrical Engineering and Computing, University of Zagreb, Zagreb, Croatia
Venue:
Signal Processing - Special section: Multimodal human-computer interfaces
Year:
2006

Citing 6
Cited 3

Neural networks: a systematic introduction

Neural networks: a systematic introduction
Lip synchronization for animation

ACM SIGGRAPH 97 Visual Proceedings: The art and interdisciplinary programs of SIGGRAPH '97
Voice puppetry

Proceedings of the 26th annual conference on Computer graphics and interactive techniques
Speech-driven cartoon animation with emotions

MULTIMEDIA '01 Proceedings of the ninth ACM international conference on Multimedia
MPEG-4 Facial Animation: The Standard,Implementation and Applications

MPEG-4 Facial Animation: The Standard,Implementation and Applications
Genetic optimization of neural network configurations for natural language learning

Genetic optimization of neural network configurations for natural language learning

Towards Realistic Real Time Speech-Driven Facial Animation

IVA '08 Proceedings of the 8th international conference on Intelligent Virtual Agents
A comprehensive audio-visual corpus for teaching sound persian phoneme articulation

SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
Soft-computing methods for text-to-speech driven avatars

MMACTEE'09 Proceedings of the 11th WSEAS international conference on Mathematical methods and computational techniques in electrical engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Lip synchronization is a method for the determination of the mouth and tongue motion during a speech. It is widely used in multimedia productions, and real time implementation is opening application possibilities in multimodal interfaces. We present an implementation of real time, language independent lip synchronization based on the classification of the speech signal, represented by MFCC vectors, into visemes using neural networks (NNs). Our implementation improves real time lip synchronization by using a genetic algorithm for obtaining a near optimal NN topology. The automatic NN configuration with genetic algorithms eliminates the need for tedious manual NN design by trial and error and considerably improves the viseme classification results. Moreover, by the direct usage of visemes as the basic unit of the classification, computation overhead is reduced, since only visemes are used for the animation of the face. The results are obtained in comprehensive validation of the system using three different evaluation methods, two objective and one subjective. The obtained results indicate very good lip synchronization quality in real time conditions and for different languages, making the method suitable for a wide range of applications.