Linear least-squares algorithms for temporal difference learning
Machine Learning - Special issue on reinforcement learning
A Bayesian Framework for Reinforcement Learning
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Optimal learning: computational procedures for bayes-adaptive markov decision processes
Optimal learning: computational procedures for bayes-adaptive markov decision processes
Least-squares policy iteration
The Journal of Machine Learning Research
Tree-Based Batch Mode Reinforcement Learning
The Journal of Machine Learning Research
Reinforcement learning with Gaussian processes
ICML '05 Proceedings of the 22nd international conference on Machine learning
Bayesian sparse sampling for on-line reward optimization
ICML '05 Proceedings of the 22nd international conference on Machine learning
An analytic solution to discrete Bayesian reinforcement learning
ICML '06 Proceedings of the 23rd international conference on Machine learning
Online kernel selection for Bayesian reinforcement learning
Proceedings of the 25th international conference on Machine learning
Gaussian process dynamic programming
Neurocomputing
Near-Bayesian exploration in polynomial time
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Tree Exploration for Bayesian RL Exploration
CIMCA '08 Proceedings of the 2008 International Conference on Computational Intelligence for Modelling Control & Automation
A Bayesian sampling approach to exploration in reinforcement learning
UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Near-optimal Regret Bounds for Reinforcement Learning
The Journal of Machine Learning Research
Smarter sampling in model-based Bayesian reinforcement learning
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
Gaussian processes for sample efficient reinforcement learning with RMAX-like exploration
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
Robust bayesian reinforcement learning through tight lower bounds
EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Thompson sampling: an asymptotically optimal finite-time analysis
ALT'12 Proceedings of the 23rd international conference on Algorithmic Learning Theory
Hi-index | 0.00 |
This paper proposes a simple linear Bayesian approach to reinforcement learning. We show that with an appropriate basis, a Bayesian linear Gaussian model is sufficient for accurately estimating the system dynamics, and in particular when we allow for correlated noise. Policies are estimated by first sampling a transition model from the current posterior, and then performing approximate dynamic programming on the sampled model. This form of approximate Thompson sampling results in good exploration in unknown environments. The approach can also be seen as a Bayesian generalisation of least-squares policy iteration, where the empirical transition matrix is replaced with a sample from the posterior.