Computing and using lower and upper bounds for action elimination in MDP planning

  • Authors:
  • Ugur Kuter;Jiaqiao Hu

  • Affiliations:
  • Institute for Advanced Computer Studies, University of Maryland at College Park, College Park, MD;Department of Applied Mathematics and Statistics, State University of New York at Stony Brook, Stony Brook, NY

  • Venue:
  • SARA'07 Proceedings of the 7th International conference on Abstraction, reformulation, and approximation
  • Year:
  • 2007

Quantified Score

Hi-index 0.04

Visualization

Abstract

We describe a way to improve the performance of MDP planners by modifying them to use lower and upper bounds to eliminate non-optimal actions during their search. First, we discuss a particular state-abstraction formulation of MDP planning problems and how to use that formulation to compute bounds on the Q-functions of those planning problems. Then, we describe how to incorporate those bounds into a large class of MDP planning algorithms to control their search during planning. We provide theorems establishing the correctness of this technique and an experimental evaluation to demonstrate its effectiveness. We incorporated our ideas into two MDP planners: the Real Time Dynamic Programming (RTDP) algorithm [1] and the Adaptive Multistage (AMS) sampling algorithm [2], taken respectively from automated planning and operations research communities. Our experiments on an Unmanned Aerial Vehicles (UAVs) path planning problem demonstrate that our action-elimination technique provides significant speed-ups in the performance of both RTDP and AMS.