A hybrid approach for automatic lip localization and viseme classification to enhance visual speech recognition

Authors:
Walid Mahdi;Salah Werda;Abdelmajid Ben Hamadou
Affiliations:
Multimedia Information systems and Advanced Computing Laboratory, High Institute of Computer Science and Multimedia, University of Sfax, Sfax, Tunisia. E-mail: walid.mahdi@isimsf.rnu.tn;Multimedia Information systems and Advanced Computing Laboratory, High Institute of Computer Science and Multimedia, University of Sfax, Sfax, Tunisia. E-mail: walid.mahdi@isimsf.rnu.tn;Multimedia Information systems and Advanced Computing Laboratory, High Institute of Computer Science and Multimedia, University of Sfax, Sfax, Tunisia. E-mail: walid.mahdi@isimsf.rnu.tn
Venue:
Integrated Computer-Aided Engineering
Year:
2008

Citing 8
Cited 5

An improved automatic lipreading system to enhance speech recognition

CHI '88 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Neural network vowel-recognition jointly using voice features and mouth shape image

Pattern Recognition
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
A Comparison of Active Shape Model and Scale Decomposition Based Features for Visual Speech Recognition

ECCV '98 Proceedings of the 5th European Conference on Computer Vision-Volume II - Volume II
On merging hidden Markov models with deformable templates

ICIP '95 Proceedings of the 1995 International Conference on Image Processing (Vol. 3)-Volume 3 - Volume 3
Lipreading method using color extraction method and eigenspace technique

Systems and Computers in Japan
Non-Euclidean c-means clustering algorithms

Intelligent Data Analysis
Accurate and quasi-automatic lip tracking

IEEE Transactions on Circuits and Systems for Video Technology

A hybrid framework for protein sequence clustering and classification using signature motif information

Integrated Computer-Aided Engineering
An efficient fingerprint image compression technique based on wave atoms decomposition and multistage vector quantization

Integrated Computer-Aided Engineering
Talking Agents: A distributed architecture for interactive artistic installations

Integrated Computer-Aided Engineering
Integration of emerging computer technologies for an efficient image sequences analysis

Integrated Computer-Aided Engineering
Human automatic detection and tracking for outdoor video

Integrated Computer-Aided Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

An automatic lip-reading system is among assistive technologies for hearing impaired or elderly people. We can imagine, for example, a dependent person ordering a machine with an easy lip movement or by a simple visemes (visual phoneme) pronunciation. A lip-reading system is decomposed into three subsystems: a lip localization subsystem, then a feature extracting subsystem, followed by a classification system that maps feature vectors to visemes. The major difficulty in a lip-reading system is the extraction of the visual speech descriptors. In fact, to ensure this task it is necessary to carry out an automatic localization and tracking of the labial gestures. We present, in this paper, a new automatic approach for lip POI localization and feature extraction on a speaker's face based on mouth color information and a geometrical model of the lips. The extracted visual information is then classified in order to recognize the uttered viseme. We have developed our Automatic Lip Feature Extraction prototype (ALiFE). ALiFE prototype is evaluated for multiple speakers under natural conditions. Experiments include a group of French visemes for different speakers. Results revealed that our system recognizes 94.64% of the tested French visemes.