Improving connected letter recognition by lipreading

Authors:
Christoph Bregler;Hermann Hild;Stefan Manke;Alex Waibel
Affiliations:
International Computer Science Institute, Berkeley, CA and University of Karlsruhe, Department of Computer Science, Karlsruhe 1, Germany;Carnegie Mellon University, School of Computer Science, Pittsburgh, Pennsylvania;University of Karlsruhe, Department of Computer Science, Karlsruhe 1, Germany;Carnegie Mellon University, School of Computer Science, Pittsburgh, Pennsylvania
Venue:
ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: plenary, special, audio, underwater acoustics, VLSI, neural networks - Volume I
Year:
1993

Citing 5
Cited 1

An improved automatic lipreading system to enhance speech recognition

CHI '88 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Learning internal representations by error propagation

Parallel distributed processing: explorations in the microstructure of cognition, vol. 1
Neural network perception for mobile robot guidance

Neural network perception for mobile robot guidance
Connected Letter Recognition with a Multi-State Time Delay Neural Network

Advances in Neural Information Processing Systems 5, [NIPS Conference]
Integration of acoustic and visual speech signals using neural networks

IEEE Communications Magazine

A multidimensional dynamic time warping algorithm for efficient multimodal fusion of asynchronous data streams

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we show how recognition performance in automated speech perception can be significantly improved by additional Lipreading, so called "Speech-reading". We show this on an extension of an existing state-of-the-art speech recognition system, a modular MS-TDNN. The acoustic and visual speech data is preclassified in two separate front-end phoneme TDNNs and combined to acoustic-visual hypotheses for the Dynamic Time Warping algorithm. This is shown on a connected word recognition problem, the notoriously difficult letter spelling task. With speechreading we could reduce the error rate up to half of the error rate of the pure acoustic recognition.