Working with robots and objects: revisiting deictic reference for achieving spatial common ground

Authors:
Andrew G. Brooks;Cynthia Breazeal
Affiliations:
MIT Media Laboratory, Cambridge, MA;MIT Media Laboratory, Cambridge, MA
Venue:
Proceedings of the 1st ACM SIGCHI/SIGART conference on Human-robot interaction
Year:
2006

Citing 13
Cited 6

Integrating simultaneous input from speech, gaze, and hand gestures

Intelligent multimedia interfaces
Automatic referent resolution of deictic and anaphoric expressions

Computational Linguistics
Integration and synchronization of input modes during multimodal human-computer interaction

Proceedings of the ACM SIGCHI Conference on Human factors in computing systems
Visual Interpretation of Hand Gestures for Human-Computer Interaction: A Review

IEEE Transactions on Pattern Analysis and Machine Intelligence
Ten myths of multimodal interaction

Communications of the ACM
Building a Multimodal Human-Robot Interface

IEEE Intelligent Systems
Exploiting Distant Pointing Gestures for Object Selection in a Virtual Environment

Proceedings of the International Gesture Workshop on Gesture and Sign Language in Human-Computer Interaction
“Put-that-there”: Voice and gesture at the graphics interface

SIGGRAPH '80 Proceedings of the 7th annual conference on Computer graphics and interactive techniques
Constraining Human Body Tracking

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Where is "it"? Event Synchronization in Gaze-Speech Input Systems

Proceedings of the 5th international conference on Multimodal interfaces
Human-robot speech interface understanding inexplicit utterances using vision

CHI '04 Extended Abstracts on Human Factors in Computing Systems
Combining deictic gestures and natural language for referent identification

COLING '86 Proceedings of the 11th coference on Computational linguistics
Resolving Object References in Multimodal Dialogues for Immersive Virtual Environments

VR '04 Proceedings of the IEEE Virtual Reality 2004

Using vision, acoustics, and natural language for disambiguation

Proceedings of the ACM/IEEE international conference on Human-robot interaction
Realizing Hinokio: candidate requirements for physical avatar aystems

Proceedings of the ACM/IEEE international conference on Human-robot interaction
Pointing to space: modeling of deictic interaction referring to regions

Proceedings of the 5th ACM/IEEE international conference on Human-robot interaction
Interpreting pointing gestures and spoken requests: a probabilistic, salience-based approach

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
It's not polite to point: generating socially-appropriate deictic behaviors towards people

Proceedings of the 8th ACM/IEEE international conference on Human-robot interaction
Robot deictics: how gesture and context shape referential communication

Proceedings of the 2014 ACM/IEEE international conference on Human-robot interaction

Quantified Score

Hi-index	0.00

Visualization

Abstract

Robust joint visual attention is necessary for achieving a common frame of reference between humans and robots interacting multimodally in order to work together on real-world spatial tasks involving objects. We make a comprehensive examination of one component of this process that is often otherwise implemented in an ad hoc fashion: the ability to correctly determine the object referent from deictic reference including pointing gestures and speech. From this we describe the development of a modular spatial reasoning framework based around decomposition and resynthesis of speech and gesture into a language of pointing and object labeling. This framework supports multimodal and unimodal access in both real-world and mixed-reality workspaces, accounts for the need to discriminate and sequence identical and proximate objects, assists in overcoming inherent precision limitations in deictic gesture, and assists in the extraction of those gestures. We further discuss an implementation of the framework that has been deployed on two humanoid robot platforms to date.