Focusing computational visual attention in multi-modal human-robot interaction

  • Authors:
  • Boris Schauerte;Gernot A. Fink

  • Affiliations:
  • Institute for Anthropomatics, Karlsruhe Institute of Technology, Karlsruhe;TU Dortmund University, Dortmund

  • Venue:
  • International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Identifying verbally and non-verbally referred-to objects is an important aspect of human-robot interaction. Most importantly, it is essential to achieve a joint focus of attention and, thus, a natural interaction behavior. In this contribution, we introduce a saliency-based model that reflects how multi-modal referring acts influence the visual search, i.e. the task to find a specific object in a scene. Therefore, we combine positional information obtained from pointing gestures with contextual knowledge about the visual appearance of the referred-to object obtained from language. The available information is then integrated into a biologically-motivated saliency model that forms the basis for visual search. We prove the feasibility of the proposed approach by presenting the results of an experimental evaluation.