Automatic speechreading with applications to human-computer interfaces

Authors:
Xiaozheng Zhang;Charles C. Broun;Russell M. Mersereau;Mark A. Clements
Affiliations:
Center for Signal and Image Processing, Georgia Institute of Technology, Atlanta, GA;Motorola Human Interface Lab, Tempe, AZ;Center for Signal and Image Processing, Georgia Institute of Technology, Atlanta, GA;Center for Signal and Image Processing, Georgia Institute of Technology, Atlanta, GA
Venue:
EURASIP Journal on Applied Signal Processing
Year:
2002

Citing 15
Cited 8

The theory and practice of Bayesian image labeling

International Journal of Computer Vision
Parallel and Deterministic Algorithms from MRFs: Surface Reconstruction

IEEE Transactions on Pattern Analysis and Machine Intelligence
Feature extraction from faces using deformable templates

International Journal of Computer Vision
Fundamentals of speech recognition

Fundamentals of speech recognition
Pattern classification: a unified view of statistical and neural approaches

Pattern classification: a unified view of statistical and neural approaches
On the Estimation of Markov Random Field Parameters

IEEE Transactions on Pattern Analysis and Machine Intelligence
Video Demystified: A Handbook for the Digital Engineer

Video Demystified: A Handbook for the Digital Engineer
Statistical color models with application to skin detection

International Journal of Computer Vision
Statistical Chromaticity Models for Lip Tracking with B-splines

AVBPA '97 Proceedings of the First International Conference on Audio- and Video-Based Biometric Person Authentication
A real-time face tracker

WACV '96 Proceedings of the 3rd IEEE Workshop on Applications of Computer Vision (WACV '96)
Comparison of Face Verification Results on the XM2VTS Database

ICPR '00 Proceedings of the International Conference on Pattern Recognition - Volume 4
Automatic lipreading to enhance speech recognition (speech reading)

Automatic lipreading to enhance speech recognition (speech reading)
Face detection using quantized skin color regions merging andwavelet packet analysis

IEEE Transactions on Multimedia
Audio-visual speech modeling for continuous speech recognition

IEEE Transactions on Multimedia
Lipreading from color video

IEEE Transactions on Image Processing

Multimodal speaker/speech recognition using lip motion, lip texture and audio

Signal Processing - Special section: Multimodal human-computer interfaces
Statistical lip-appearance models trained automatically using audio information

EURASIP Journal on Applied Signal Processing
A two-channel training algorithm for hidden Markov model and its application to lip reading

EURASIP Journal on Applied Signal Processing
Block-based motion estimation analysis for lip reading user authentication systems

WSEAS Transactions on Information Science and Applications
Motion estimation analysis for unsupervised training for lip reading user authentication systems

ICAI'09 Proceedings of the 10th WSEAS international conference on Automation & information
Enhanced Lips Detection and Tracking System

IVIC '09 Proceedings of the 1st International Visual Informatics Conference on Visual Informatics: Bridging Research and Practice
Finding lips in unconstrained imagery for improved automatic speech recognition

VISUAL'07 Proceedings of the 9th international conference on Advances in visual information systems
Comparative analysis of lip features for person identification

Proceedings of the 8th International Conference on Frontiers of Information Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

There has been growing interest in introducing speech as a new modality into the human-computer interface (HCI). Motivated by the multimodal nature of speech, the visual component is considered to yield information that is not always present in the acoustic signal and enables improved system performance over acoustic-only methods, especially in noisy environments. In this paper, we investigate the usefulness of visual speech information in HCI related applications. We first introduce a new algorithm for automatically locating the mouth region by using color and motion information and segmenting the lip region by making use of both color and edge information based on Markov random fields. We then derive a relevant set of visual speech parameters and incorporate them into a recognition engine. We present various visual feature performance comparisons to explore their impact on the recognition accuracy, including the lip inner contour and the visibility of the tongue and teeth. By using a common visual feature set, we demonstrate two applications that exploit speechreading in a joint audio-visual speech signal processing task: speech recognition and speaker verification. The experimental results based on two databases demonstrate that the visual information is highly effective for improving recognition performance over a variety of acoustic noise levels.