Robust Audio-Visual Speech Recognition Based on Late Integration

Authors:
Jong-Seok Lee;Cheol Hoon Park
Affiliations:
Sch. of Electr. Eng. & Comput. Sci., KAIST, Daejeon;-
Venue:
IEEE Transactions on Multimedia
Year:
2008

Citing 0
Cited 4

Design and implementation of a lip reading system in smart phone environment

IRI'09 Proceedings of the 10th IEEE international conference on Information Reuse & Integration
Adaptive Reliability Measure and Optimum Integration Weight for Decision Fusion Audio-visual Speech Recognition

Journal of Signal Processing Systems
n-Gram modeling of relevant features for lip-reading

Proceedings of the International Conference on Advances in Computing, Communications and Informatics
Fusion of facial expressions and EEG for implicit affective tagging

Image and Vision Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Audio-visual speech recognition (AVSR) using acoustic and visual signals of speech has received attention because of its robustness in noisy environments. In this paper, we present a late integration scheme-based AVSR system whose robustness under various noise conditions is improved by enhancing the performance of the three parts composing the system. First, we improve the performance of the visual subsystem by using the stochastic optimization method for the hidden Markov models as the speech recognizer. Second, we propose a new method of considering dynamic characteristics of speech for improved robustness of the acoustic subsystem. Third, the acoustic and the visual subsystems are effectively integrated to produce final robust recognition results by using neural networks. We demonstrate the performance of the proposed methods via speaker-independent isolated word recognition experiments. The results show that the proposed system improves robustness over the conventional system under various noise conditions without a priori knowledge about the noise contained in the speech.