Bayesian real-time dynamic programming

Authors:
Scott Sanner;Robby Goetschalckx;Kurt Driessens;Guy Shani
Affiliations:
SML Group, National ICT Australia, Canberra, Australia;Department of Computer Science, Catholic University of Leuven, Heverlee, Belgium;Department of Computer Science, Catholic University of Leuven, Heverlee, Belgium;MLAS Group, Microsoft Research, Redmond, WA
Venue:
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Year:
2009

Citing 9
Cited 4

Principles of metareasoning

Artificial Intelligence - Special issue on knowledge representation
Bayesian Q-learning

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
LAO: a heuristic search algorithm that finds solutions with loops

Artificial Intelligence - Special issue on heuristic search in artificial intelligence
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Learning to Act using Real-Time Dynamic Programming

Learning to Act using Real-Time Dynamic Programming
Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees

ICML '05 Proceedings of the 22nd international conference on Machine learning
Focused real-time dynamic programming for MDPs: squeezing more out of a heuristic

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Faster heuristic search algorithms for planning with uncertainty and full feedback

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Efficient decision-theoretic planning: techniques and empirical analysis

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence

Iterative Bounding LAO*

Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Symbolic bounded real-time dynamic programming

SBIA'10 Proceedings of the 20th Brazilian conference on Advances in artificial intelligence
Topological value iteration algorithms

Journal of Artificial Intelligence Research
GPU based generation of state transition models using simulations for unmanned surface vehicle trajectory planning

Robotics and Autonomous Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Real-time dynamic programming (RTDP) solves Markov decision processes (MDPs) when the initial state is restricted, by focusing dynamic programming on the envelope of states reachable from an initial state set. RTDP often provides performance guarantees without visiting the entire state space. Building on RTDP, recent work has sought to improve its efficiency through various optimizations, including maintaining upper and lower bounds to both govern trial termination and prioritize state exploration. In this work, we take a Bayesian perspective on these upper and lower bounds and use a value of perfect information (VPI) analysis to govern trial termination and exploration in a novel algorithm we call VPI-RTDP. VPI-RTDP leads to an improvement over state-of-the-art RTDP methods, empirically yielding up to a three-fold reduction in the amount of time and number of visited states required to achieve comparable policy performance.