Object category recognition using probabilistic fusion of speech and image classifiers

  • Authors:
  • Kate Saenko;Trevor Darrell

  • Affiliations:
  • Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA;Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA

  • Venue:
  • MLMI'07 Proceedings of the 4th international conference on Machine learning for multimodal interaction
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Multimodal scene understanding is an integral part of humanrobot interaction (HRI) in situated environments. Especially useful is category-level recognition, where the the system can recognize classes of objects of scenes rather than specific instances (e.g., any chair vs. this particular chair.) Humans use multiple modalities to understand which object category is being referred to, simultaneously interpreting gesture, speech and visual appearance, and using one modality to disambiguate the information contained in the others. In this paper, we address the problem of fusing visual and acoustic information to predict object categories, when an image of the object and speech input from the user is available to the HRI system. Using probabilistic decision fusion, we show improved classification rates on a dataset containing a wide variety of object categories, compared to using either modality alone.