Audio-visual robot command recognition: D-META'12 grand challenge

  • Authors:
  • Jordi Sanchez-Riera;Xavier Alameda-Pineda;Radu Horaud

  • Affiliations:
  • INRIA Rhone Alpes, Montbonnot, France;INRIA Rhone Alpes, Montbonnot, France;INRIA Rhone Alpes, Montbonnor, France

  • Venue:
  • Proceedings of the 14th ACM international conference on Multimodal interaction
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper addresses the problem of audio-visual command recognition in the framework of the D-META Grand Challenge1. Temporal and non-temporal learning models are trained on visual and auditory descriptors. In order to set a proper baseline, the methods are tested on the "Robot Gestures" scenario of the publicly available RAVEL data set, following the leave-one-out cross-validation strategy. The classification-level audio-visual fusion strategy allows for compensating the errors of the unimodal (audio or vision) classifiers. The obtained results (an average audio-visual recognition rate of almost 80%) encourage us to investigate on how to further develop and improve the methodology described in this paper.