Smoothed Sarsa: reinforcement learning for robot delivery tasks

Authors:
Deepak Ramachandran;Rakesh Gupta
Affiliations:
Computer Science Dept., University of Illinois at Urbana-Champaign, Urbana, IL;Honda Research Institute USA, Inc., Mountain View, CA
Venue:
ICRA'09 Proceedings of the 2009 IEEE international conference on Robotics and Automation
Year:
2009

Citing 7
Cited 1

Xavier: a robot navigation architecture based on partially observable Markov decision process models

Artificial intelligence and mobile robots
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
PEGASUS: A policy search method for large MDPs and POMDPs

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
FastSLAM: a factored solution to the simultaneous localization and mapping problem

Eighteenth national conference on Artificial intelligence
Reasoning for a multi-modal service robot considering uncertainty in human-robot interaction

Proceedings of the 3rd ACM/IEEE international conference on Human robot interaction
Probabilistic robot navigation in partially observable environments

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Planning and acting in partially observable stochastic domains

Artificial Intelligence

Human-robot cross-training: computational formulation, modeling and evaluation of a human team training strategy

Proceedings of the 8th ACM/IEEE international conference on Human-robot interaction

Quantified Score

Hi-index	0.00

Visualization

Abstract

Our goal in this work is to make high level decisions for mobile robots. In particular, given a queue of prioritized object delivery tasks, we wish to find a sequence of actions in real time to accomplish these tasks efficiently. We introduce a novel reinforcement learning algorithm called Smoothed Sarsa that learns a good policy for these delivery tasks by delaying the backup reinforcement step until the uncertainty in the state estimate improves. The state space is modeled by a Dynamic Bayesian Network and updated using a Region-based Particle Filter. We take advantage of the fact that only discrete (topological) representations of entity locations are needed for decision-making, to make the tracking and decision making more efficient. Our experiments show that policy search leads to faster task completion times as well as higher total reward compared to a manually crafted policy. Smoothed Sarsa learns a policy orders of magnitude faster than previous policy search algorithms. We demonstrate our results on the Player/Stage simulator and on the Pioneer robot.