Robust Sensor Fusion: Analysis and Application to Audio Visual Speech Recognition

  • Authors:
  • Javier R. Movellan;Paul Mineiro

  • Affiliations:
  • Department of Cognitive Science, University of California San Diego, La Jolla, California CA 92093-0515. E-mail: {movellan,pmineiro}@cogsci.ucsd.edu;Department of Cognitive Science, University of California San Diego, La Jolla, California CA 92093-0515. E-mail: {movellan,pmineiro}@cogsci.ucsd.edu

  • Venue:
  • Machine Learning - Special issue on context sensitivity and concept drift
  • Year:
  • 1998

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper analyzes the issue of catastrophic fusion, a problem thatoccurs in multimodal recognition systems that integrate the output from several modules while working in non-stationary environments. Forconcreteness we frame the analysis with regard to the problem of automaticaudio visual speech recognition (AVSR), but the issues at hand are verygeneral and arise in multimodal recognition systems which need to work in awide variety of contexts. Catastrophic fusion is said to have occurred whenthe performance of a multimodal system is inferior to the performance ofsome isolated modules, e.g., when the performance of the audio visualspeech recognition system is inferior to that of the audio system alone.Catastrophic fusion arises because recognition modules make implicitassumptions and thus operate correctly only within a certain context.Practice shows that when modules are tested in contexts inconsistent withtheir assumptions, their influence on the fused product tends to increase,with catastrophic results. We propose a principled solution to this problembased upon Bayesian ideas of competitive models and inferencerobustification. We study the approach analytically on a classic Gaussiandiscrimination task and then apply it to a realistic problem on audiovisual speech recognition (AVSR) with excellent results.