Pattern Recognition and Machine Learning (Information Science and Statistics)
Pattern Recognition and Machine Learning (Information Science and Statistics)
Long-time span acoustic activity analysis from far-field sensors in smart homes
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Audio-visual speech recognition using depth information from the Kinect in noisy video conditions
Proceedings of the 5th International Conference on PErvasive Technologies Related to Assistive Environments
Hi-index | 0.00 |
Speech recognition is a natural means of interaction for a human with a smart assistive environment. In order for this interaction to be effective, such a system should attain a high recognition rate even under adverse conditions. Audio-visual speech recognition (AVSR) can be of help in such environments, especially under the presence of audio noise. However the impact of visual noise to its performance has not been studied sufficiently in the literature. In this paper, we examine the effects of visual noise to AVSR, reporting experiments on the relatively simple task of connected digit recognition, under moderate acoustic noise and a variety of types of visual noise. The latter can be caused by either faulty sensors or video signal transmission problems that can be found in smart assistive environments. Our AVSR system exhibits higher accuracy in comparison to an audio-only recognizer and robust performance in most cases of noisy video signals considered.