Visual recognition of speech consonants using facial movement features

  • Authors:
  • Wai Chee Yau;Dinesh Kant Kumar;Sridhar Poosapadi Arjunan

  • Affiliations:
  • School of Electrical and Computer Engineering, RMIT University, GPO Box 2476V Melbourne, Victoria 3001, Australia;(Correspd. Tel.: +61399251954/ E-mail: dinesh@rmit.edu.au) School of Electrical and Computer Engineering, RMIT University, GPO Box 2476V Melbourne, Victoria 3001, Australia;School of Electrical and Computer Engineering, RMIT University, GPO Box 2476V Melbourne, Victoria 3001, Australia

  • Venue:
  • Integrated Computer-Aided Engineering - Informatics in Control, Automation and Robotics
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a visual speech recognition technique using facial movement video. The acoustic signals of consonants are often confusing in noisy environments. To overcome this shortcoming, the focus of this paper is identifying consonants using visual information. This paper investigates the feasibility of using facial movements to identify phonemes. The proposed approach adopts a visual speech model based on the viseme model of the Moving Picture Experts Group 4 (MPEG-4) standard. It is a movement-based system, and the facial movements are segmented from the video using an accumulative image subtraction method that results in a 2-D grayscale motion history image (MHI). The MHI is classified using a combination of the discrete stationary wavelet transform (SWT) and image moments (Hu moments, geometric moments and Zernike moments). Feedforward multilayer perceptron (MLP) neural networks with backpropagation (BPN) learning algorithm are used to classify the features to investigate the performance of the three moment features. The experimental results indicate that Zernike moments have better representation ability and provide rotational invariant property for the proposed application. The results also demonstrate that the proposed technique can identify consonants reliably using the viseme model of MPEG-4 standard with a recognition rate of 85%.