Object category detection using audio-visual cues

  • Authors:
  • Jie Luo;Barbara Caputo;Alon Zweig;Jörg-Hendrik Bach;Jörn Anemüller

  • Affiliations:
  • IDIAP Research Institute, Martigny, Switzerland and Swiss Federal Institute of Technology in Lausanne, Lausanne, Switzerland;IDIAP Research Institute, Martigny, Switzerland and Swiss Federal Institute of Technology in Lausanne, Lausanne, Switzerland;Hebrew university of Jerusalem, Jerusalem, Israel;Carl von Ossietzky University Oldenburg, Oldenburg, Germany;Carl von Ossietzky University Oldenburg, Oldenburg, Germany

  • Venue:
  • ICVS'08 Proceedings of the 6th international conference on Computer vision systems
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Categorization is one of the fundamental building blocks of cognitive systems. Object categorization has traditionally been addressed in the vision domain, even though cognitive agents are intrinsically multimodal. Indeed, biological systems combine several modalities in order to achieve robust categorization. In this paper we propose a multimodal approach to object category detection, using audio and visual information. The auditory channel is modeled on biologically motivated spectral features via a discriminative classifier. The visual channel is modeled by a state of the art part based model. Multimodality is achieved using two fusion schemes, one high level and the other low level. Experiments on six different object categories, under increasingly difficult conditions, show strengths and weaknesses of the two approaches, and clearly underline the open challenges for multimodal category detection.