Viseme classification for talking head application

  • Authors:
  • Mariusz Leszczynski;Władysław Skarbek

  • Affiliations:
  • Faculty of Electronics and Information Technology, Warsaw University of Technology;Faculty of Electronics and Information Technology, Warsaw University of Technology

  • Venue:
  • CAIP'05 Proceedings of the 11th international conference on Computer Analysis of Images and Patterns
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Real time classification algorithms are presented for visual mouth appearances (visemes) which correspond to phonemes and their speech contexts. They are used at the design of talking head application. Two feature extraction procedures were verified. The first one is based on the normalized triangle mesh covering mouth area and the color image texture vector indexed by barycentric coordinates. The second procedure performs Discrete Fourier Transform on the image rectangle including mouth w.r.t. a small block of DFT coefficients. The classifier has been designed by the optimized LDA method which uses two singular subspace approach. Despite of higher computational complexity (about three milliseconds per video frame on Pentium IV 3.2GHz), the DFT+LDA approach has practical advantages over MESH+LDA classifier. Firstly, it is better in recognition rate more than two percent (97.2% versus 99.3%). Secondly, the automatic identification of the covering mouth rectangle is more robust than the automatic identification of the covering mouth triangle mesh.