Symbolic bounded real-time dynamic programming

Authors:
Karina Valdivia Delgado;Cheng Fang;Scott Sanner;Leliane Nunes De Barros
Affiliations:
University of São Paulo, São Paulo, Brazil;University of Sydney, Sydney, Australia;National ICT Australia, Canberra, Australia;University of São Paulo, São Paulo, Brazil
Venue:
SBIA'10 Proceedings of the 20th Brazilian conference on Advances in artificial intelligence
Year:
2010

Citing 9
Cited 0

Algebraic decision diagrams and their applications

ICCAD '93 Proceedings of the 1993 IEEE/ACM international conference on Computer-aided design
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Learning to Act using Real-Time Dynamic Programming

Learning to Act using Real-Time Dynamic Programming
Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees

ICML '05 Proceedings of the 22nd international conference on Machine learning
Efficient solution algorithms for factored MDPs

Journal of Artificial Intelligence Research
Bayesian real-time dynamic programming

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
SPUDD: stochastic planning using decision diagrams

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Context-specific independence in Bayesian networks

UAI'96 Proceedings of the Twelfth international conference on Uncertainty in artificial intelligence
Symbolic generalization for on-line planning

UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence

Quantified Score

Hi-index	0.01

Visualization

Abstract

Real-time dynamic programming (RTDP) solves Markov decision processes (MDPs) when the initial state is restricted. By visiting (and updating) only a fraction of the state space, this approach can be used to solve problems with intractably large state space. In order to improve the performance of RTDP, a variant based on symbolic representation was proposed, named sRTDP. Traditional RTDP approaches work best on problems with sparse transition matrices where they can often efficiently achieve ε-convergence without visiting all states; however, on problems with dense transition matrices where most states are reachable in one step, the sRTDP approach shows an advantage over traditional RTDP by up to three orders of magnitude, as we demonstrate in this paper. We also specify a new variant of sRTDP based on BRTDP, named sBRTDP, which converges quickly when compared to RTDP variants, since it does less updating by making a better choice of the next state to be visited.