An audio-visual imposture scenario by talking face animation

Authors:
Walid Karam;Chafic Mokbel;Hanna Greige;Guido Aversano;Catherine Pelachaud;Gérard Chollet
Affiliations:
Computer Science Department, University of Balamand, Tripoli, Lebanon;Computer Science Department, University of Balamand, Tripoli, Lebanon;Computer Science Department, University of Balamand, Tripoli, Lebanon;Ecole Nationale Supérieure des Télécommunications, Paris, France;IUT–Université Paris 8, Montreuil, France;Ecole Nationale Supérieure des Télécommunications, Paris, France
Venue:
Nonlinear Speech Modeling and Applications
Year:
2005

Citing 7
Cited 2

Video Rewrite: driving visual speech with audio

Proceedings of the 24th annual conference on Computer graphics and interactive techniques
Lip movement synthesis from speech based on hidden Markov models

Speech Communication - Special issue on auditory-visual speech processing
Modeling and Animating Realistic Faces from Images

International Journal of Computer Vision
Information Fusion in Biometrics

AVBPA '01 Proceedings of the Third International Conference on Audio- and Video-Based Biometric Person Authentication
Fusion of Audio-Visual Information for Integrated Speech Processing

AVBPA '01 Proceedings of the Third International Conference on Audio- and Video-Based Biometric Person Authentication
Speech-to-video synthesis using MPEG-4 compliant visual features

IEEE Transactions on Circuits and Systems for Video Technology
An HMM-based speech-to-video synthesizer

IEEE Transactions on Neural Networks

VideoTicket: detecting identity fraud attempts via audiovisual certificates and signatures

NSPW '07 Proceedings of the 2007 Workshop on New Security Paradigms
Data driven approaches to speech and language processing

Nonlinear Speech Modeling and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe a system that allows an impostor to lead an audio-visual telephone conversation, and sign data electronically on behalf of an authorized client. During the conversation, audio and video of the impostor are altered so as to mimic the client. The voice of an impostor is processed and used to reproduce the voice of the authorized client. Speech segments obtained from client's recordings are used to synthesize new sentences that the client never pronounced. On the visual side, the imposter's talking face is detected and facial features are extracted and used to animate a synthetic talking face. The texture of the impersonated face is mapped onto the talking head and coded for transmission over the phone, along with the synthesized voice. Audio-visual coding and synthesis is realized by indexing in a memory containing audio-visual sequences. Stochastic models (coupled HMM) of characteristic segments are used to drive the memory search.