Incrementally biasing visual search using natural language input

Authors:
Evan Krause;Rehj Cantrell;Ekaterina Potapova;Michael Zillich;Matthias Scheutz
Affiliations:
Tufts University, Medford, MA, USA;Indiana University, Bloomington, IN, USA;Vienna University of Technology, Vienna, Austria;Vienna University of Technology, Vienna, Austria;Tufts University, Medford, MA, USA
Venue:
Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Year:
2013

Citing 8
Cited 0

Coarse Qualitative Descriptions in Robot Navigation

Spatial Cognition II, Integrating Abstract Theories, Empirical Studies, Formal Methods, and Practical Applications
Adaptive execution in complex dynamic worlds

Adaptive execution in complex dynamic worlds
Socially Distributed Perception: GRACE plays social tag at AAAI 2005

Autonomous Robots
Deep linguistic processing for spoken dialogue systems

DeepLP '07 Proceedings of the Workshop on Deep Linguistic Processing
A framework for fast incremental interpretation during speech decoding

Computational Linguistics
Active segmentation for robotics

IROS'09 Proceedings of the 2009 IEEE/RSJ international conference on Intelligent robots and systems
Investigating multimodal real-time patterns of joint attention in an hri word learning task

Proceedings of the 5th ACM/IEEE international conference on Human-robot interaction
Selecting what is important: training visual attention

KI'05 Proceedings of the 28th annual German conference on Advances in Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Humans expect interlocutors both human and robot to resolve spoken references to visually-perceivable objects incrementally as the referents are verbally described. For this reason, tight integration of visual search with natural language processing, and real-time operation of both are requirements for natural interactions between humans and robots. In this paper, we present an integrated robotic architecture with novel incremental vision and natural language processing. We demonstrate that incrementally refining attentional focus using linguistic constraints achieves significantly better performance of the vision system compared to non-incremental visual processing.