Toward understanding natural language directions

Authors:
Thomas Kollar;Stefanie Tellex;Deb Roy;Nicholas Roy
Affiliations:
Massachusettes Institute of Technology, Cambridge, MA, USA;Massachusettes Institute of Technology, Cambridge, MA, USA;Massachusettes Institute of Technology, Cambridge, MA, USA;Massachusettes Institute of Technology, Cambridge, MA, USA
Venue:
Proceedings of the 5th ACM/IEEE international conference on Human-robot interaction
Year:
2010

Citing 11
Cited 19

The acquisition of lexical semantics for spatial terms: a connectionist model of perceptual categorization

The acquisition of lexical semantics for spatial terms: a connectionist model of perceptual categorization
A location representation for generating descriptive walking directions

Proceedings of the 10th international conference on Intelligent user interfaces
Applying computational models of spatial prepositions to visually situated dialog

Computational Linguistics
Walk the talk: connecting language, knowledge, and action in route instructions

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Grounding spatial prepositions for video search

Proceedings of the 2009 international conference on Multimodal interfaces
Where to go: interpreting natural directions using global inference

ICRA'09 Proceedings of the 2009 IEEE international conference on Robotics and Automation
What to do and how to do it: translating natural language directives into temporal and dynamic logic representation for goal management and action execution

ICRA'09 Proceedings of the 2009 IEEE international conference on Robotics and Automation
Utilizing object-object and object-scene context when planning to find things

ICRA'09 Proceedings of the 2009 IEEE international conference on Robotics and Automation
Improved Techniques for Grid Mapping With Rao-Blackwellized Particle Filters

IEEE Transactions on Robotics
Spatial language for human-robot dialogs

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Interpretation of Spatial Language in a Map Navigation Task

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

A game-theoretic approach to generating spatial descriptions

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Grounding spatial language for video search

International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction
Comparing spoken language route instructions for robots across environment representations

SIGDIAL '10 Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Modeling environments from a route perspective

Proceedings of the 6th international conference on Human-robot interaction
Hierarchical dialogue system for guide robot in shopping mall environments

TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
Spatial role labeling: Towards extraction of spatial relations from natural language

ACM Transactions on Speech and Language Processing (TSLP)
Facilitating mental modeling in collaborative human-robot interaction through adverbial cues

SIGDIAL '11 Proceedings of the SIGDIAL 2011 Conference
Towards automatic functional test execution

Proceedings of the 2012 ACM international conference on Intelligent User Interfaces
Fast online lexicon learning for grounded language acquisition

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Toward learning perceptually grounded word meanings from unaligned parallel data

SIAC '12 Proceedings of the Second Workshop on Semantic Interpretation in an Actionable Context
Unsupervised PCFG induction for grounded language learning with highly ambiguous supervision

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
The structure and generality of spoken route instructions

SIGDIAL '12 Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Human behavior understanding for robotics

HBU'12 Proceedings of the Third international conference on Human Behavior Understanding
Interacting with a robot: a guide robot understanding natural language instructions

UCAmI'12 Proceedings of the 6th international conference on Ubiquitous Computing and Ambient Intelligence
Understanding suitable locations for waiting

Proceedings of the 8th ACM/IEEE international conference on Human-robot interaction
KnowRob: A knowledge processing infrastructure for cognition-enabled robots

International Journal of Robotics Research
Knowledgeable talking robots

AGI'13 Proceedings of the 6th international conference on Artificial General Intelligence
Extracting Spatial Information From Place Descriptions

Proceedings of The First ACM SIGSPATIAL International Workshop on Computational Models of Place
Learning perceptually grounded word meanings from unaligned parallel data

Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Speaking using unconstrained natural language is an intuitive and flexible way for humans to interact with robots. Understanding this kind of linguistic input is challenging because diverse words and phrases must be mapped into structures that the robot can understand, and elements in those structures must be grounded in an uncertain environment. We present a system that follows natural language directions by extracting a sequence of spatial description clauses from the linguistic input and then infers the most probable path through the environment given only information about the environmental geometry and detected visible objects. We use a probabilistic graphical model that factors into three key components. The first component grounds landmark phrases such as "the computers" in the perceptual frame of the robot by exploiting co-occurrence statistics from a database of tagged images such as Flickr. Second, a spatial reasoning component judges how well spatial relations such as "past the computers" describe a path. Finally, verb phrases such as "turn right" are modeled according to the amount of change in orientation in the path. Our system follows 60% of the directions in our corpus to within 15 meters of the true destination, significantly outperforming other approaches.