Discrete-time signal processing (2nd ed.)
Discrete-time signal processing (2nd ed.)
Adaptive fusion of acoustic and visual sources for automatic speech recognition
Speech Communication - Special issue on auditory-visual speech processing
Neural Networks for Pattern Recognition
Neural Networks for Pattern Recognition
Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
Sensor fusion weighting measures in Audio-Visual Speech Recognition
ACSC '04 Proceedings of the 27th Australasian conference on Computer science - Volume 26
Articulatory features for robust visual speech recognition
Proceedings of the 6th international conference on Multimodal interfaces
Digital Image Processing (3rd Edition)
Digital Image Processing (3rd Edition)
An evaluation of visual speech features for the tasks of speech and speaker recognition
AVBPA'03 Proceedings of the 4th international conference on Audio- and video-based biometric person authentication
Visual model structures and synchrony constraints for audio-visual speech recognition
IEEE Transactions on Audio, Speech, and Language Processing
Audio-visual speech modeling for continuous speech recognition
IEEE Transactions on Multimedia
A review of speech-based bimodal recognition
IEEE Transactions on Multimedia
Statistical multimodal integration for audio-visual speech processing
IEEE Transactions on Neural Networks
Hi-index | 0.00 |
The use of visual information of speech has been shown to be effective for compensating for performance degradation of acoustic speech recognition in noisy environments. However, visual noise is usually ignored in most of audio-visual speech recognition systems, while it can be included in visual speech signals during acquisition or transmission of the signals. In this paper, we present a new temporal filtering technique for extraction of noise-robust visual features. In the proposed method, a carefully designed band-pass filter is applied to the temporal pixel value sequences of lip region images in order to remove unwanted temporal variations due to visual noise, illumination conditions or speakers' appearances. We demonstrate that the method can improve not only visual speech recognition performance for clean and noisy images but also audio-visual speech recognition performance in both acoustically and visually noisy conditions.