A cantonese speech-driven talking face using translingual audio-to-visual conversion

Authors:
Lei Xie;Helen Meng;Zhi-Qiang Liu
Affiliations:
Human-Computer Communications Laboratory, Dept. of Systems Engineering & Engineering Management, The Chinese University of Hong Kong, Hong Kong;Human-Computer Communications Laboratory, Dept. of Systems Engineering & Engineering Management, The Chinese University of Hong Kong, Hong Kong;School of Creative Media, City University of Hong Kong, Hong Kong
Venue:
ISCSLP'06 Proceedings of the 5th international conference on Chinese Spoken Language Processing
Year:
2006

Citing 5
Cited 0

Synthesizing realistic facial expressions from photographs

Proceedings of the 25th annual conference on Computer graphics and interactive techniques
Poisson image editing

ACM SIGGRAPH 2003 Papers
Talking Faces - Technologies and Applications

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
A coupled HMM approach to video-realistic speech animation

Pattern Recognition
Animating expressive faces across languages

IEEE Transactions on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a novel approach towards a video- realistic, speech-driven talking face for Cantonese. We present a technique that realizes a talking face for a target language (Cantonese) using only audio-visual facial recordings for a base language (English). Given a Cantonese speech input, we first use a Cantonese speech recognizer to generate a Cantonese syllable transcription. Then we map it to an English phoneme transcription via a translingual mapping scheme that involves symbol mapping and time alignment from Cantonese syllables to English phonemes. With the phoneme transcription, the input speech, and the audio-visual models for English, an EM-based conversion algorithm is adopted to generate mouth animation parameters associated with the input Cantonese audio. We have carried out audio-visual syllable recognition experiments to objectively evaluate the proposed talking face. Results show that the visual speech synthesized by the Cantonese talking face can effectively increase the accuracy of Cantonese syllable recognition under noisy acoustic conditions.