Focused real-time dynamic programming for MDPs: squeezing more out of a heuristic

Authors:
Trey Smith;Reid Simmons
Affiliations:
Robotics Institute, Carnegie Mellon University;Robotics Institute, Carnegie Mellon University
Venue:
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Year:
2006

Citing 7
Cited 10

Learning to act using real-time dynamic programming

Artificial Intelligence - Special volume on computational research on interaction and agency, part 1
LAO: a heuristic search algorithm that finds solutions with loops

Artificial Intelligence - Special issue on heuristic search in artificial intelligence
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Heuristic search value iteration for POMDPs

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees

ICML '05 Proceedings of the 22nd international conference on Machine learning
Speeding up the convergence of value iteration in partially observable Markov decision processes

Journal of Artificial Intelligence Research
Faster heuristic search algorithms for planning with uncertainty and full feedback

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence

A Q-decomposition LRTDP Approach to Resource Allocation

IAT '06 Proceedings of the IEEE/WIC/ACM international conference on Intelligent Agent Technology
A Q-decomposition and bounded RTDP approach to resource allocation

Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
Reasoning for a multi-modal service robot considering uncertainty in human-robot interaction

Proceedings of the 3rd ACM/IEEE international conference on Human robot interaction
Adaptive multi-robot wide-area exploration and mapping

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 1
R-FRTDP: A Real-Time DP Algorithm with Tight Bounds for a Stochastic Resource Allocation Problem

CAI '07 Proceedings of the 20th conference of the Canadian Society for Computational Studies of Intelligence on Advances in Artificial Intelligence
A Markovian model for dynamic and constrained resource allocation problems

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Bayesian real-time dynamic programming

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Iterative Bounding LAO*

Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Accelerating point-based POMDP algorithms via greedy strategies

SIMPAR'10 Proceedings of the Second international conference on Simulation, modeling, and programming for autonomous robots
Topological value iteration algorithms

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Real-time dynamic programming (RTDP) is a heuristic search algorithm for solving MDPs. We present a modified algorithm called Focused RTDP with several improvements. While RTDP maintains only an upper bound on the long-term reward function, FRTDP maintains two-sided bounds and bases the output policy on the lower bound. FRTDP guides search with a new rule for outcome selection, focusing on parts of the search graph that contribute most to uncertainty about the values of good policies. FRTDP has modified trial termination criteria that should allow it to solve some problems (within Ε) that RTDP cannot. Experiments show that for all the problems we studied, FRTDP significantly outperforms RTDP and LRTDP, and converges with up to six times fewer backups than the state-of-the-art HDP algorithm.