Temporal filtering of visual speech for audio-visual speech recognition in acoustically and visually challenging environments

  • Authors:
  • Jong-Seok Lee;Cheol Hoon Park

  • Affiliations:
  • KAIST, Daejeon, South Korea;KAIST, Daejeon, South Korea

  • Venue:
  • Proceedings of the 9th international conference on Multimodal interfaces
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

The use of visual information of speech has been shown to be effective for compensating for performance degradation of acoustic speech recognition in noisy environments. However, visual noise is usually ignored in most of audio-visual speech recognition systems, while it can be included in visual speech signals during acquisition or transmission of the signals. In this paper, we present a new temporal filtering technique for extraction of noise-robust visual features. In the proposed method, a carefully designed band-pass filter is applied to the temporal pixel value sequences of lip region images in order to remove unwanted temporal variations due to visual noise, illumination conditions or speakers' appearances. We demonstrate that the method can improve not only visual speech recognition performance for clean and noisy images but also audio-visual speech recognition performance in both acoustically and visually noisy conditions.