Smarter sampling in model-based Bayesian reinforcement learning

Authors:
Pablo Samuel Castro;Doina Precup
Affiliations:
School of Computer Science, McGill University;School of Computer Science, McGill University
Venue:
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
Year:
2010

Citing 12
Cited 3

How does the value function of a Markov decision process depend on the transition probabilities?

Mathematics of Operations Research
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
A Bayesian Framework for Reinforcement Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Structural Properties of Stochastic Dynamic Programs

Operations Research
Using upper confidence bounds for online learning

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Metrics for finite Markov decision processes

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Bayesian sparse sampling for on-line reward optimization

ICML '05 Proceedings of the 22nd international conference on Machine learning
An analytic solution to discrete Bayesian reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Near-Bayesian exploration in polynomial time

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
A Bayesian approach to imitation in reinforcement learning

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
A Bayesian sampling approach to exploration in reinforcement learning

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Model based Bayesian exploration

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence

Robust bayesian reinforcement learning through tight lower bounds

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Linear Bayesian reinforcement learning

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Scalable and efficient bayes-adaptive reinforcement learning based on monte-carlo tree search

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Bayesian reinforcement learning (RL) is aimed at making more efficient use of data samples, but typically uses significantly more computation. For discrete Markov Decision Processes, a typical approach to Bayesian RL is to sample a set of models from an underlying distribution, and compute value functions for each, e.g. using dynamic programming. This makes the computation cost per sampled model very high. Furthermore, the number of model samples to take at each step has mainly been chosen in an ad-hoc fashion. We propose a principled method for determining the number of models to sample, based on the parameters of the posterior distribution over models. Our sampling method is local, in that we may choose a different number of samples for each state-action pair. We establish bounds on the error in the value function between a random model sample and the mean model from the posterior distribution. We compare our algorithm against state-of-the-art methods and demonstrate that our method provides a better trade-off between performance and running time.