Improving connected letter recognition by lipreading

  • Authors:
  • Christoph Bregler;Hermann Hild;Stefan Manke;Alex Waibel

  • Affiliations:
  • International Computer Science Institute, Berkeley, CA and University of Karlsruhe, Department of Computer Science, Karlsruhe 1, Germany;Carnegie Mellon University, School of Computer Science, Pittsburgh, Pennsylvania;University of Karlsruhe, Department of Computer Science, Karlsruhe 1, Germany;Carnegie Mellon University, School of Computer Science, Pittsburgh, Pennsylvania

  • Venue:
  • ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: plenary, special, audio, underwater acoustics, VLSI, neural networks - Volume I
  • Year:
  • 1993

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we show how recognition performance in automated speech perception can be significantly improved by additional Lipreading, so called "Speech-reading". We show this on an extension of an existing state-of-the-art speech recognition system, a modular MS-TDNN. The acoustic and visual speech data is preclassified in two separate front-end phoneme TDNNs and combined to acoustic-visual hypotheses for the Dynamic Time Warping algorithm. This is shown on a connected word recognition problem, the notoriously difficult letter spelling task. With speechreading we could reduce the error rate up to half of the error rate of the pure acoustic recognition.