Audio-visual speech recognition using red exclusion and neural networks

Authors:
Trent W. Lewis;David M. W. Powers
Affiliations:
Flinders University of South Australia, Adelaide, South Australia 5001;Flinders University of South Australia, Adelaide, South Australia 5001
Venue:
ACSC '02 Proceedings of the twenty-fifth Australasian conference on Computer science - Volume 4
Year:
2002

Citing 9
Cited 3

Digital representations of speech signals

Readings in speech recognition
Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences

Readings in speech recognition
Fundamentals of speech recognition

Fundamentals of speech recognition
Robust Sensor Fusion: Analysis and Application to Audio Visual Speech Recognition

Machine Learning - Special issue on context sensitivity and concept drift
Statistical Language Learning

Statistical Language Learning
Speechreading by Man and Machine: Models, Systems, and Applications

Speechreading by Man and Machine: Models, Systems, and Applications
Continuous Audio-Visual Speech Recognition

ECCV '98 Proceedings of the 5th European Conference on Computer Vision-Volume II - Volume II
A real-time face tracker

WACV '96 Proceedings of the 3rd IEEE Workshop on Applications of Computer Vision (WACV '96)
Adaptive bimodal sensor fusion for automatic speechreading

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02

Information fusion for wireless sensor networks: Methods, models, and classifications

ACM Computing Surveys (CSUR)
Vision in HCI: embodiment, multimodality and information capacity

VisHCI '06 Proceedings of the HCSNet workshop on Use of vision in human-computer interaction - Volume 56
Minors as miners: modelling and evaluating ontological and linguistic learning

AusDM '08 Proceedings of the 7th Australasian Data Mining Conference - Volume 87

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic speech recognition (ASR) performs well under restricted conditions, but performance degrades in noisy environments. Audio-Visual Speech Recognition (AVSR) combats this by incorporating a visual signal into the recognition. This paper briefly reviews the contribution of psycholinguistics to this endeavour and the recent advances in machine AVSR. An important first step in AVSR is that of feature extraction from the mouth region and a technique developed by the authors is breifly presented. This paper examines examine how useful this extraction technique in combination with several integration arhitectures is at the given task, demonstrates that vision does infact assist speech recognition when used in a linguistically guided fashion, and gives insight remaining issues.