Visual Speech Recognition Method Using Translation, Scale and Rotation Invariant Features

Authors:
Wai Chee Yau;Dinesh Kant Kumar;Sridhar Poosapadi Arjunan
Affiliations:
RMIT University, Australia;RMIT University, Australia;RMIT University, Australia
Venue:
AVSS '06 Proceedings of the IEEE International Conference on Video and Signal Based Surveillance
Year:
2006

Citing 0
Cited 1

Visual speech recognition using motion features and hidden Markov models

CAIP'07 Proceedings of the 12th international conference on Computer analysis of images and patterns

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper reports on a visual speech recognition method that is invariant to translation, rotation and scale. Dynamic features representing the mouth motion is extracted from the video data by using a motion segmentation technique termed as motion history image (MHI). MHI is generated by applying accumulative image differencing technique on the sequence of mouth images. Invariant features are derived from the MHI using feature extraction algorithm that combines Discrete Stationary Wavelet Transform (SWT) and moments. A 2-D SWT at level one is applied to decompose MHI to produce one approximate and three detail sub images. The feature descriptors consist of three moments (geometric moments, Hu moments and Zernike moments) computed from the SWT approximate image. The moments features are normalized to achieve the invariance properties. Artificial neural network (ANN) with back propagation learning algorithm is used to classify the moments features. Initial experiments were conducted to test the sensitivity of the proposed approach to rotation, translation and scale of the mouth images and obtained promising results.