Audio-to-Visual Conversion Via HMM Inversion for Speech-Driven Facial Animation

Authors:
Lucas D. Terissi;Juan Carlos Gómez
Affiliations:
Laboratory for System Dynamics and Signal Processing, FCEIA, Universidad Nacional de Rosario CIFASIS, CONICET, Rosario, Argentina 2000;Laboratory for System Dynamics and Signal Processing, FCEIA, Universidad Nacional de Rosario CIFASIS, CONICET, Rosario, Argentina 2000
Venue:
SBIA '08 Proceedings of the 19th Brazilian Symposium on Artificial Intelligence: Advances in Artificial Intelligence
Year:
2008

Citing 5
Cited 0

Lip movement synthesis from speech based on hidden Markov models

Speech Communication - Special issue on auditory-visual speech processing
Voice puppetry

Proceedings of the 26th annual conference on Computer graphics and interactive techniques
Hidden Markov Model Inversion for Audio-to-Visual Conversion in an MPEG-4 Facial Animation System

Journal of VLSI Signal Processing Systems
A coupled HMM approach to video-realistic speech animation

Pattern Recognition
Audio/visual mapping with cross-modal hidden Markov models

IEEE Transactions on Multimedia

Quantified Score

Hi-index	0.01

Visualization

Abstract

In this paper, the inversion of a joint Audio-Visual Hidden Markov Model is proposed to estimate the visual information from speech data in a speech driven MPEG-4 compliant facial animation system. The inversion algorithm is derived for the general case of considering full covariance matrices for the audio-visual observations. The system performance is evaluated for the cases of full and diagonal covariance matrices. Experimental results show that full covariance matrices are preferable since similar, to the case of using diagonal matrices, performance can be achieved using a less complex model. The experiments are carried out using audio-visual databases compiled by the authors.