Kernel-Based lip shape clustering with phoneme recognition for real-time voice driven talking face

Authors:
Po-Yi Shih;Jhing-Fa Wang;Zong-You Chen
Affiliations:
Department of Electrical Engineering, National Cheng Kung University, Tainan City, Taiwan;Department of Electrical Engineering, National Cheng Kung University, Tainan City, Taiwan;Department of Electrical Engineering, National Cheng Kung University, Tainan City, Taiwan
Venue:
ISNN'10 Proceedings of the 7th international conference on Advances in Neural Networks - Volume Part II
Year:
2010

Citing 7
Cited 0

Support vector domain description

Pattern Recognition Letters - Special issue on pattern recognition in practice VI
A Speech Driven Talking Head System Based on a Single Face Image

PG '99 Proceedings of the 7th Pacific Conference on Computer Graphics and Applications
Talking Faces - Technologies and Applications

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
A Novel Kernel Method for Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
An Approach to Speech Driven Animation

IIH-MSP '08 Proceedings of the 2008 International Conference on Intelligent Information Hiding and Multimedia Signal Processing
Realistic Mouth-Synching for Speech-Driven Talking Face Using Articulatory Modelling

IEEE Transactions on Multimedia
Real-Time Continuous Phoneme Recognition System Using Class-Dependent Tied-Mixture HMM With HBT Structure for Speech-Driven Lip-Sync

IEEE Transactions on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

This work describes a real-time voice driven method using which a speaker's lip shape is synchronized with the corresponding speech signal, for a low bandwidth mobile devices Phoneme recognition is generally regarded as an important task in the operation of a real-time lip-sync system In this work, the use of the kernel-based lip shape clustering algorithm is inspired based on one-class support vector machines (SVM) A set of speaker who has similar lip shape is clustered and a cluster-dependent vowel phoneme is then constructed for each cluster We use sum of absolute difference (SAD) as vowel lip shape likelihood to cluster into categories Then adjust the source and destination pictures of lip shape in the transparent level using alpha blending for lip-sync animation We find that this method outperforms conventional CHMM method in phoneme error rate (PER), 8.78% and 32.25%, respectively.