How does the value function of a Markov decision process depend on the transition probabilities?
Mathematics of Operations Research
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
A Bayesian Framework for Reinforcement Learning
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Structural Properties of Stochastic Dynamic Programs
Operations Research
Using upper confidence bounds for online learning
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Metrics for finite Markov decision processes
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Bayesian sparse sampling for on-line reward optimization
ICML '05 Proceedings of the 22nd international conference on Machine learning
An analytic solution to discrete Bayesian reinforcement learning
ICML '06 Proceedings of the 23rd international conference on Machine learning
Near-Bayesian exploration in polynomial time
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
A Bayesian approach to imitation in reinforcement learning
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
A Bayesian sampling approach to exploration in reinforcement learning
UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Model based Bayesian exploration
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Robust bayesian reinforcement learning through tight lower bounds
EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Linear Bayesian reinforcement learning
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Scalable and efficient bayes-adaptive reinforcement learning based on monte-carlo tree search
Journal of Artificial Intelligence Research
Hi-index | 0.00 |
Bayesian reinforcement learning (RL) is aimed at making more efficient use of data samples, but typically uses significantly more computation. For discrete Markov Decision Processes, a typical approach to Bayesian RL is to sample a set of models from an underlying distribution, and compute value functions for each, e.g. using dynamic programming. This makes the computation cost per sampled model very high. Furthermore, the number of model samples to take at each step has mainly been chosen in an ad-hoc fashion. We propose a principled method for determining the number of models to sample, based on the parameters of the posterior distribution over models. Our sampling method is local, in that we may choose a different number of samples for each state-action pair. We establish bounds on the error in the value function between a random model sample and the mean model from the posterior distribution. We compare our algorithm against state-of-the-art methods and demonstrate that our method provides a better trade-off between performance and running time.