Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees

Authors:
H. Brendan McMahan;Maxim Likhachev;Geoffrey J. Gordon
Affiliations:
Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA
Venue:
ICML '05 Proceedings of the 22nd international conference on Machine learning
Year:
2005

Citing 7
Cited 13

Learning to act using real-time dynamic programming

Artificial Intelligence - Special volume on computational research on interaction and agency, part 1
Planning under time constraints in stochastic domains

Artificial Intelligence - Special volume on planning and scheduling
LAO: a heuristic search algorithm that finds solutions with loops

Artificial Intelligence - Special issue on heuristic search in artificial intelligence
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Heuristic search value iteration for POMDPs

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Finding approximate POMDP solutions through belief compression

Journal of Artificial Intelligence Research
Faster heuristic search algorithms for planning with uncertainty and full feedback

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence

A Q-decomposition and bounded RTDP approach to resource allocation

Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
Adaptive multi-robot wide-area exploration and mapping

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 1
R-FRTDP: A Real-Time DP Algorithm with Tight Bounds for a Stochastic Resource Allocation Problem

CAI '07 Proceedings of the 20th conference of the Canadian Society for Computational Studies of Intelligence on Advances in Artificial Intelligence
Focused real-time dynamic programming for MDPs: squeezing more out of a heuristic

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Probabilistic temporal planning with uncertain durations

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
A hybridized planner for stochastic domains

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Bayesian real-time dynamic programming

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Reinforcement learning algorithms based on mGA and EA with policy iterations

LSMS'07 Proceedings of the Life system modeling and simulation 2007 international conference on Bio-Inspired computational intelligence and applications
Amsaa: a multistep anticipatory algorithm for online stochastic combinatorial optimization

CPAIOR'08 Proceedings of the 5th international conference on Integration of AI and OR techniques in constraint programming for combinatorial optimization problems
Iterative Bounding LAO*

Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Symbolic bounded real-time dynamic programming

SBIA'10 Proceedings of the 20th Brazilian conference on Advances in artificial intelligence
Topological value iteration algorithms

Journal of Artificial Intelligence Research
Point-based online value iteration algorithm in large POMDP

Applied Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

MDPs are an attractive formalization for planning, but realistic problems often have intractably large state spaces. When we only need a partial policy to get from a fixed start state to a goal, restricting computation to states relevant to this task can make much larger problems tractable. We introduce a new algorithm, Bounded RTDP, which can produce partial policies with strong performance guarantees while only touching a fraction of the state space, even on problems where other algorithms would have to visit the full state space. To do so, Bounded RTDP maintains both upper and lower bounds on the optimal value function. The performance of Bounded RTDP is greatly aided by the introduction of a new technique to efficiently find suitable upper bounds; this technique can also be used to provide informed initialization to a wide range of other planning algorithms.