Multimodal interaction abilities for a robot companion

Authors:
Brice Burger;Isabelle Ferrané;Frédéric Lerasle
Affiliations:
CNRS, LAAS, Toulouse, France and IRIT, Toulouse, France and Université de Toulouse, UPS, INSA, INP, ISAE, LAAS, CNRS, Toulouse, France;IRIT, Toulouse, France and Université de Toulouse, UPS, INSA, INP, ISAE, LAAS, CNRS, Toulouse, France;CNRS, LAAS, Toulouse, France and Université de Toulouse, UPS, INSA, INP, ISAE, LAAS, CNRS, Toulouse, France
Venue:
ICVS'08 Proceedings of the 6th international conference on Computer vision systems
Year:
2008

Citing 4
Cited 1

CONDENSATION—Conditional Density Propagation forVisual Tracking

International Journal of Computer Vision
ICONDENSATION: Unifying Low-Level and High-Level Tracking in a Stochastic Framework

ECCV '98 Proceedings of the 5th European Conference on Computer Vision-Volume I - Volume I
Real-Time Interactively Distributed Multi-Object Tracking Using a Magnetic-Inertia Potential Model

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Visual recognition of pointing gestures for human-robot interaction

Image and Vision Computing

Semi-automatic multimodal user interface generation

Proceedings of the 1st ACM SIGCHI symposium on Engineering interactive computing systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Among the cognitive abilities a robot companion must be endowed with, human perception and speech understanding are both fundamental in the context of multimodal human-robot interaction. In order to provide a mobile robot with the visual perception of its user and means to handle verbal and multimodal communication, we have developed and integrated two components. In this paper we will focus on an interactively distributed multiple object tracker dedicated to two-handed gestures and head location in 3D. Its relevance is highlighted by in- and off- line evaluations from data acquired by the robot. Implementation and preliminary experiments on a household robot companion, including speech recognition and understanding as well as basic fusion with gesture, are then demonstrated. The latter illustrate how vision can assist speech by specifying location references, object/person IDs in verbal statements in order to interpret natural deictic commands given by human beings. Extensions of our work are finally discussed.