An improved automatic lipreading system to enhance speech recognition

Authors:
E. Petajan;B. Bischoff;D. Bodoff;N. M. Brooke
Affiliations:
AT&T Bell Laboratories, Murray Hill, NJ;AT&T Bell Laboratories, Murray Hill, NJ;AT&T Bell Laboratories, Murray Hill, NJ;Univ. of Bath, Bath, UK
Venue:
CHI '88 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Year:
1988

Citing 1
Cited 13

Automatic lipreading to enhance speech recognition (speech reading)

Automatic lipreading to enhance speech recognition (speech reading)

Detection and Recognition of Periodic, Nonrigid Motion

International Journal of Computer Vision
Audio-visual tracking for natural interactivity

MULTIMEDIA '99 Proceedings of the seventh ACM international conference on Multimedia (Part 1)
Affine-Invariant Visual Features Contain Supplementary Information to Enhance Speech Recognition

AVBPA '01 Proceedings of the Third International Conference on Audio- and Video-Based Biometric Person Authentication
Moving-talker, speaker-independent feature study, and baseline results using the CUAVE multimodal speech corpus

EURASIP Journal on Applied Signal Processing
Audio-visual speech recognition using MPEG-4 compliant visual features

EURASIP Journal on Applied Signal Processing
Synergy of Lip-Motion and Acoustic Features in Biometric Speech and Speaker Recognition

IEEE Transactions on Computers
A hybrid approach for automatic lip localization and viseme classification to enhance visual speech recognition

Integrated Computer-Aided Engineering
State-of-the-art on spatio-temporal information-based video retrieval

Pattern Recognition
On parsing visual sequences with the hidden Markov model

Journal on Image and Video Processing
Improving connected letter recognition by lipreading

ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: plenary, special, audio, underwater acoustics, VLSI, neural networks - Volume I
Mudra: a unified multimodal interaction framework

ICMI '11 Proceedings of the 13th international conference on multimodal interfaces
Which stereo matching algorithm for accurate 3d face creation ?

IWCIA'04 Proceedings of the 10th international conference on Combinatorial Image Analysis
Small-vocabulary speech recognition using a silent speech interface based on magnetic sensing

Speech Communication

Quantified Score

Hi-index	0.02

Visualization

Abstract

Current acoustic speech recognition technology performs well with very small vocabularies in noise or with large vocabularies in very low noise. Accurate acoustic speech recognition in noise with vocabularies over 100 words has yet to be achieved. Humans frequently lipread the visible facial speech articulations to enhance speech recognition, especially when the acoustic signal is degraded by noise or hearing impairment. Automatic lipreading has been found to improve significantly acoustic speech recognition and could be advantageous in noisy environments such as offices, aircraft and factories.An improved version of a previously described automatic lipreading system has been developed which uses vector quantization, dynamic time warping, and a new heuristic distance measure. This paper presents visual speech recognition results from multiple speakers under optimal conditions. Results from combined acoustic and visual speech recognition are also presented which show significantly improved performance compared to the acoustic recognition system alone.