Computing and using lower and upper bounds for action elimination in MDP planning

Authors:
Ugur Kuter;Jiaqiao Hu
Affiliations:
Institute for Advanced Computer Studies, University of Maryland at College Park, College Park, MD;Department of Applied Mathematics and Statistics, State University of New York at Stony Brook, Stony Brook, NY
Venue:
SARA'07 Proceedings of the 7th International conference on Abstraction, reformulation, and approximation
Year:
2007

Citing 18
Cited 1

Optimal path-finding algorithms*

Search in Artificial Intelligence
A theory of abstraction

Artificial Intelligence
Planning under time constraints in stochastic domains

Artificial Intelligence - Special volume on planning and scheduling
Feature-based methods for large scale dynamic programming

Machine Learning - Special issue on reinforcement learning
Planning under uncertainty: structural assumptions and computational leverage

New directions in AI planning
Abstraction and approximate decision-theoretic planning

Artificial Intelligence
Stochastic dynamic programming with factored representations

Artificial Intelligence
LAO: a heuristic search algorithm that finds solutions with loops

Artificial Intelligence - Special issue on heuristic search in artificial intelligence
Dynamic Programming and Optimal Control

Dynamic Programming and Optimal Control
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Equivalence notions and model minimization in Markov decision processes

Artificial Intelligence - special issue on planning with uncertainty and incomplete information
The Linear Programming Approach to Approximate Dynamic Programming

Operations Research
An Adaptive Sampling Algorithm for Solving Markov Decision Processes

Operations Research
Hierarchical reinforcement learning with the MAXQ value function decomposition

Journal of Artificial Intelligence Research
SMDP homomorphisms: an algebraic approach to abstraction in semi-Markov decision processes

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Finding optimal solutions to Rubik's cube using pattern databases

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Model reduction techniques for computing approximately optimal solutions for Markov decision processes

UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence

Topological value iteration algorithms

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.04

Visualization

Abstract

We describe a way to improve the performance of MDP planners by modifying them to use lower and upper bounds to eliminate non-optimal actions during their search. First, we discuss a particular state-abstraction formulation of MDP planning problems and how to use that formulation to compute bounds on the Q-functions of those planning problems. Then, we describe how to incorporate those bounds into a large class of MDP planning algorithms to control their search during planning. We provide theorems establishing the correctness of this technique and an experimental evaluation to demonstrate its effectiveness. We incorporated our ideas into two MDP planners: the Real Time Dynamic Programming (RTDP) algorithm [1] and the Adaptive Multistage (AMS) sampling algorithm [2], taken respectively from automated planning and operations research communities. Our experiments on an Unmanned Aerial Vehicles (UAVs) path planning problem demonstrate that our action-elimination technique provides significant speed-ups in the performance of both RTDP and AMS.