Focusing computational visual attention in multi-modal human-robot interaction

Authors:
Boris Schauerte;Gernot A. Fink
Affiliations:
Institute for Anthropomatics, Karlsruhe Institute of Technology, Karlsruhe;TU Dortmund University, Dortmund
Venue:
International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction
Year:
2010

Citing 22
Cited 1

Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
Modeling visual attention via selective tuning

Artificial Intelligence - Special volume on computer vision
Object-based visual attention for computer vision

Artificial Intelligence
Integrating context-free and context-dependent attentional mechanisms for gestural object reference

Machine Vision and Applications
Introduction to the CoNLL-2000 shared task: chunking

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
VOCUS: A Visual Attention System for Object Detection and Goal-Directed Search (Lecture Notes in Computer Science / Lecture Notes in Artificial Intelligence)

VOCUS: A Visual Attention System for Object Detection and Goal-Directed Search (Lecture Notes in Computer Science / Lecture Notes in Artificial Intelligence)
Duplicate Record Detection: A Survey

IEEE Transactions on Knowledge and Data Engineering
Modeling the impact of shared visual information on collaborative reference

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Visual recognition of pointing gestures for human-robot interaction

Image and Vision Computing
The roles of haptic-ostensive referring expressions in cooperative, task-based human-robot dialogue

Proceedings of the 3rd ACM/IEEE international conference on Human robot interaction
Visual attention in spoken human-robot interaction

Proceedings of the 4th ACM/IEEE international conference on Human robot interaction
A context-dependent attention system for a social robot

IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
Salience in the generation of multimodal referring acts

Proceedings of the 2009 international conference on Multimodal interfaces
Selective visual attention enables learning and recognition of multiple objects in cluttered scenes

Computer Vision and Image Understanding - Special issue: Attention and performance in computer vision
Computational visual attention systems and their cognitive foundations: A survey

ACM Transactions on Applied Perception (TAP)
Active multi-view object search on a humanoid head

ICRA'09 Proceedings of the 2009 IEEE international conference on Robotics and Automation
Pointing to space: modeling of deictic interaction referring to regions

Proceedings of the 5th ACM/IEEE international conference on Human-robot interaction
Visual search for an object in a 3D environment using a mobile robot

Computer Vision and Image Understanding
Web-Based Learning of Naturalized Color Models for Human-Machine Interaction

DICTA '10 Proceedings of the 2010 International Conference on Digital Image Computing: Techniques and Applications
Deixis: how to determine demonstrated objects using a pointing cone

GW'05 Proceedings of the 6th international conference on Gesture in Human-Computer Interaction and Simulation
Social interactions in HRI: the robot view

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
A computational model for color naming and describing color composition of images

IEEE Transactions on Image Processing

Autonomous knowledge acquisition based on artificial curiosity: Application to mobile robots in an indoor environment

Robotics and Autonomous Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Identifying verbally and non-verbally referred-to objects is an important aspect of human-robot interaction. Most importantly, it is essential to achieve a joint focus of attention and, thus, a natural interaction behavior. In this contribution, we introduce a saliency-based model that reflects how multi-modal referring acts influence the visual search, i.e. the task to find a specific object in a scene. Therefore, we combine positional information obtained from pointing gestures with contextual knowledge about the visual appearance of the referred-to object obtained from language. The available information is then integrated into a biologically-motivated saliency model that forms the basis for visual search. We prove the feasibility of the proposed approach by presenting the results of an experimental evaluation.