Lipreading from color video

Authors:
G. I. Chiou;Jenq-Neng Hwang
Affiliations:
Dept. of Electr. Eng., Washington Univ., Seattle, WA;-
Venue:
IEEE Transactions on Image Processing
Year:
1997

Citing 0
Cited 13

Adaptive mouth segmentation using chromatic features

Pattern Recognition Letters
Emotional Chinese talking head system

Proceedings of the 6th international conference on Multimodal interfaces
Moving-talker, speaker-independent feature study, and baseline results using the CUAVE multimodal speech corpus

EURASIP Journal on Applied Signal Processing
Audio-visual speech recognition using MPEG-4 compliant visual features

EURASIP Journal on Applied Signal Processing
Automatic speechreading with applications to human-computer interfaces

EURASIP Journal on Applied Signal Processing
Local spatiotemporal descriptors for visual recognition of spoken phrases

Proceedings of the international workshop on Human-centered multimedia
On parsing visual sequences with the hidden Markov model

Journal on Image and Video Processing
Lipreading with local spatiotemporal descriptors

IEEE Transactions on Multimedia
Eigendecomposition of images correlated on S1, S2, and SO(3) using spectral theory

IEEE Transactions on Image Processing
Intelligent wheelchair multi-modal human-machine interfaces in lip contour extraction based on PMM

ROBIO'09 Proceedings of the 2009 international conference on Robotics and biomimetics
Attractor-Guided particle filtering for lip contour tracking

ACCV'06 Proceedings of the 7th Asian conference on Computer Vision - Volume Part I
Lipreading procedure for liveness verification in video authentication systems

HAIS'12 Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part I
Lipreading procedure based on dynamic programming

ICAISC'12 Proceedings of the 11th international conference on Artificial Intelligence and Soft Computing - Volume Part I

Quantified Score

Hi-index	0.01

Visualization

Abstract

We have designed and implemented a lipreading system that recognizes isolated words using only color video of human lips (without acoustic data). The system performs video recognition using “snakes” to extract visual features of geometric space, Karhunen-Loeve transform (KLT) to extract principal components in the color eigenspace, and hidden Markov models (HMM's) to recognize the combined visual features sequences. With the visual information alone, we were able to achieve 94% accuracy for ten isolated words