Multi-Modal Speech Recognition Using Optical-Flow Analysis for Lip Images
Journal of VLSI Signal Processing Systems
Dynamic Bayesian networks for audio-visual speech recognition
EURASIP Journal on Applied Signal Processing
Hi-index | 0.00 |
Speech recognition still lacks robustness when faced with changing noise characteristics. Automatic lip reading on the other hand is not affected by acoustic noise and can therefore provide the speech recognizer with valuable additional information, especially since the visual modality contains information that is complementary to information in the audio channel. In this paper we present a novel way of processing the video signal for lip reading and a post-processing data transformation that can be used alongside it. The presented Lip Geometry Estimation (LGE) is compared with other geometry- and image intensity-based techniques typically deployed for this task. A large vocabulary continuous audio-visual speech recognizer for Dutch using this method has been implemented. We show that a combined system improves upon audio-only recognition in the presence of noise.