Viseme classification for talking head application

Authors:
Mariusz Leszczynski;Władysław Skarbek
Affiliations:
Faculty of Electronics and Information Technology, Warsaw University of Technology;Faculty of Electronics and Information Technology, Warsaw University of Technology
Venue:
CAIP'05 Proceedings of the 11th international conference on Computer Analysis of Images and Patterns
Year:
2005

Citing 3
Cited 2

Introduction to statistical pattern recognition (2nd ed.)

Introduction to statistical pattern recognition (2nd ed.)
Using Discriminant Eigenfeatures for Image Retrieval

IEEE Transactions on Pattern Analysis and Machine Intelligence
Pattern Recognition and Neural Networks

Pattern Recognition and Neural Networks

Towards a Northern Sotho talking head

AFRIGRAPH '07 Proceedings of the 5th international conference on Computer graphics, virtual reality, visualisation and interaction in Africa
Clustering Persian viseme using phoneme subspace for developing visual speech application

Multimedia Tools and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Real time classification algorithms are presented for visual mouth appearances (visemes) which correspond to phonemes and their speech contexts. They are used at the design of talking head application. Two feature extraction procedures were verified. The first one is based on the normalized triangle mesh covering mouth area and the color image texture vector indexed by barycentric coordinates. The second procedure performs Discrete Fourier Transform on the image rectangle including mouth w.r.t. a small block of DFT coefficients. The classifier has been designed by the optimized LDA method which uses two singular subspace approach. Despite of higher computational complexity (about three milliseconds per video frame on Pentium IV 3.2GHz), the DFT+LDA approach has practical advantages over MESH+LDA classifier. Firstly, it is better in recognition rate more than two percent (97.2% versus 99.3%). Secondly, the automatic identification of the covering mouth rectangle is more robust than the automatic identification of the covering mouth triangle mesh.