Block-based motion estimation analysis for lip reading user authentication systems

Authors:
Khaled Alghathbar;Hanan A. Mahmoud
Affiliations:
Centre of Excellence in Information Assurance, King Saud University, Riyadh, Kingdom of Saudi Arabia;Centre of Excellence in Information Assurance, King Saud University, Riyadh, Kingdom of Saudi Arabia
Venue:
WSEAS Transactions on Information Science and Applications
Year:
2009

Citing 5
Cited 2

Extraction of Visual Features for Lipreading

IEEE Transactions on Pattern Analysis and Machine Intelligence
Speech recognition technology in the ubiquitous/wearable computing environment

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 06
Automatic speechreading with applications to human-computer interfaces

EURASIP Journal on Applied Signal Processing
Association-based image retrieval

WSEAS Transactions on Signal Processing
Semi-hierarchical based motion estimation algorithm for the Dirac video encoder

WSEAS Transactions on Signal Processing

Simplifying enterprise wide authorization management through distribution of concerns and responsibilities

WSEAS Transactions on Information Science and Applications
Lip reading of hearing impaired persons using HMM

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a lip reading technique for speech recognition by using motion estimation analysis. The method described in this paper represents a sub-system of the Silent Pass project. Silent Pass is a lip reading password entry system for security applications. It presents a user authentication system based on password lip reading. Motion estimation is done for lip movement image sequences representing speech. In this methodology, the motion estimation is computed without extracting the speaker's lip contours and location. This leads to obtaining robust visual features for lip movements representing utterances. Our methodology comprises of two phases, a training phase and a recognition phase. In both phases an n × n video frame of the image sequence for an utterance (can be an alphanumeric character, word or a sentence in more complicated analysis) is divided into m × m blocks. Our method calculates and fits eight curves for each frame. Each curve represents motion estimation of this frame in a specific direction. These eight curves are representing set of features of a specific frame and are extracted in an unsupervised manner. The feature set consists of the integral values of the motion estimation. These features are expected to be extremely effective in the training phase. The feature sets are used to characterize specific utterances with no additional acoustic feature set. A corpus of utterances and their motion estimation features are built in the training phase. The recognition phase is accomplished by extracting the feature set, from the new image sequence of lip movement of an utterance, and compare it to the corpus using the mean square error metric for recognition.