Integrating simultaneous input from speech, gaze, and hand gestures
Intelligent multimedia interfaces
Automatic referent resolution of deictic and anaphoric expressions
Computational Linguistics
Integration and synchronization of input modes during multimodal human-computer interaction
Proceedings of the ACM SIGCHI Conference on Human factors in computing systems
Visual Interpretation of Hand Gestures for Human-Computer Interaction: A Review
IEEE Transactions on Pattern Analysis and Machine Intelligence
Ten myths of multimodal interaction
Communications of the ACM
Building a Multimodal Human-Robot Interface
IEEE Intelligent Systems
Exploiting Distant Pointing Gestures for Object Selection in a Virtual Environment
Proceedings of the International Gesture Workshop on Gesture and Sign Language in Human-Computer Interaction
“Put-that-there”: Voice and gesture at the graphics interface
SIGGRAPH '80 Proceedings of the 7th annual conference on Computer graphics and interactive techniques
Constraining Human Body Tracking
ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Where is "it"? Event Synchronization in Gaze-Speech Input Systems
Proceedings of the 5th international conference on Multimodal interfaces
Human-robot speech interface understanding inexplicit utterances using vision
CHI '04 Extended Abstracts on Human Factors in Computing Systems
Combining deictic gestures and natural language for referent identification
COLING '86 Proceedings of the 11th coference on Computational linguistics
Resolving Object References in Multimodal Dialogues for Immersive Virtual Environments
VR '04 Proceedings of the IEEE Virtual Reality 2004
Using vision, acoustics, and natural language for disambiguation
Proceedings of the ACM/IEEE international conference on Human-robot interaction
Realizing Hinokio: candidate requirements for physical avatar aystems
Proceedings of the ACM/IEEE international conference on Human-robot interaction
Pointing to space: modeling of deictic interaction referring to regions
Proceedings of the 5th ACM/IEEE international conference on Human-robot interaction
Interpreting pointing gestures and spoken requests: a probabilistic, salience-based approach
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
It's not polite to point: generating socially-appropriate deictic behaviors towards people
Proceedings of the 8th ACM/IEEE international conference on Human-robot interaction
Robot deictics: how gesture and context shape referential communication
Proceedings of the 2014 ACM/IEEE international conference on Human-robot interaction
Hi-index | 0.00 |
Robust joint visual attention is necessary for achieving a common frame of reference between humans and robots interacting multimodally in order to work together on real-world spatial tasks involving objects. We make a comprehensive examination of one component of this process that is often otherwise implemented in an ad hoc fashion: the ability to correctly determine the object referent from deictic reference including pointing gestures and speech. From this we describe the development of a modular spatial reasoning framework based around decomposition and resynthesis of speech and gesture into a language of pointing and object labeling. This framework supports multimodal and unimodal access in both real-world and mixed-reality workspaces, accounts for the need to discriminate and sequence identical and proximate objects, assists in overcoming inherent precision limitations in deictic gesture, and assists in the extraction of those gestures. We further discuss an implementation of the framework that has been deployed on two humanoid robot platforms to date.