Sensor fusion weighting measures in Audio-Visual Speech Recognition

Authors:
Trent W. Lewis;David M. W. Powers
Affiliations:
Flinders University, Adelaide, South Australia;Flinders University, Adelaide, South Australia
Venue:
ACSC '04 Proceedings of the 27th Australasian conference on Computer science - Volume 26
Year:
2004

Citing 9
Cited 4

Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences

Readings in speech recognition
Fundamentals of speech recognition

Fundamentals of speech recognition
Robust Sensor Fusion: Analysis and Application to Audio Visual Speech Recognition

Machine Learning - Special issue on context sensitivity and concept drift
Speechreading by Man and Machine: Models, Systems, and Applications

Speechreading by Man and Machine: Models, Systems, and Applications
Continuous Audio-Visual Speech Recognition

ECCV '98 Proceedings of the 5th European Conference on Computer Vision-Volume II - Volume II
Adaptive bimodal sensor fusion for automatic speechreading

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Cross-modal prediction in audio-visual communication

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 04
Audio-visual speech modeling for continuous speech recognition

IEEE Transactions on Multimedia
Designing classifier fusion systems by genetic algorithms

IEEE Transactions on Evolutionary Computation

Vision in HCI: embodiment, multimodality and information capacity

VisHCI '06 Proceedings of the HCSNet workshop on Use of vision in human-computer interaction - Volume 56
Temporal filtering of visual speech for audio-visual speech recognition in acoustically and visually challenging environments

Proceedings of the 9th international conference on Multimodal interfaces
Adaptive Reliability Measure and Optimum Integration Weight for Decision Fusion Audio-visual Speech Recognition

Journal of Signal Processing Systems
Information fusion based learning for frugal traffic state sensing

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Audio-Visual Speech Recognition (AVSR) uses vision to enhance speech recognition but also introduces the problem of how to join (or fuse) these two signals together. Mainstream research achieves this using a weighted product of the output of the phoneme classifiers for both modalities. This paper analyses current weighting measures and compares them to several new measures proposed by the authors. Most importantly, when calculating the dispersion of the output there is a shift from analysing the variance to analysing the skewness of the distribution. Experiments in AVSR using neural networks raise questions of the utility of such measures with some intriguing results.