Sensor fusion weighting measures in Audio-Visual Speech Recognition

  • Authors:
  • Trent W. Lewis;David M. W. Powers

  • Affiliations:
  • Flinders University, Adelaide, South Australia;Flinders University, Adelaide, South Australia

  • Venue:
  • ACSC '04 Proceedings of the 27th Australasian conference on Computer science - Volume 26
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Audio-Visual Speech Recognition (AVSR) uses vision to enhance speech recognition but also introduces the problem of how to join (or fuse) these two signals together. Mainstream research achieves this using a weighted product of the output of the phoneme classifiers for both modalities. This paper analyses current weighting measures and compares them to several new measures proposed by the authors. Most importantly, when calculating the dispersion of the output there is a shift from analysing the variance to analysing the skewness of the distribution. Experiments in AVSR using neural networks raise questions of the utility of such measures with some intriguing results.