Adaptive bimodal sensor fusion for automatic speechreading

  • Authors:
  • U. Meier;W. Hurst;P. Duchnowski

  • Affiliations:
  • Interactive Syst. Labs., Karlsruhe Univ., Germany;-;-

  • Venue:
  • ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
  • Year:
  • 1996

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present work on improving the performance of automated speech recognizers by using additional visual information: (lip-/speechreading); achieving error reduction of up to 50%. This paper focuses on different methods of combining the visual and acoustic data to improve the recognition performance. We show this on an extension of an existing state-of-the-art speech recognition system, a modular MS-TDNN. We have developed adaptive combination methods at several levels of the recognition network. Additional information such as estimated signal-to-noise ratio (SNR) is used in some cases. The results of the different combination methods are shown for clean speech and data with artificial noise (white, music, motor). The new combination methods adapt automatically to varying noise conditions making hand-tuned parameters unnecessary.