Audio visual speech recognition in noisy visual environments

Authors:
Georgios Galatas;Gerasimos Potamianos;Alexandros Papangelis;Fillia Makedon
Affiliations:
Institute of Informatics and Telecommunications, NCSR Demokritos, Greece, and University of Texas at Arlington;Institute of Informatics and Telecommunications, NCSR Demokritos, Greece;Institute of Informatics and Telecommunications, NCSR Demokritos, Greece, and University of Texas at Arlington;University of Texas at Arlington
Venue:
Proceedings of the 4th International Conference on PErvasive Technologies Related to Assistive Environments
Year:
2011

Citing 2
Cited 1

Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
Long-time span acoustic activity analysis from far-field sensors in smart homes

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing

Audio-visual speech recognition using depth information from the Kinect in noisy video conditions

Proceedings of the 5th International Conference on PErvasive Technologies Related to Assistive Environments

Quantified Score

Hi-index	0.00

Visualization

Abstract

Speech recognition is a natural means of interaction for a human with a smart assistive environment. In order for this interaction to be effective, such a system should attain a high recognition rate even under adverse conditions. Audio-visual speech recognition (AVSR) can be of help in such environments, especially under the presence of audio noise. However the impact of visual noise to its performance has not been studied sufficiently in the literature. In this paper, we examine the effects of visual noise to AVSR, reporting experiments on the relatively simple task of connected digit recognition, under moderate acoustic noise and a variety of types of visual noise. The latter can be caused by either faulty sensors or video signal transmission problems that can be found in smart assistive environments. Our AVSR system exhibits higher accuracy in comparison to an audio-only recognizer and robust performance in most cases of noisy video signals considered.