Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Neuro-Dynamic Programming
Kernel Methods for Pattern Analysis
Kernel Methods for Pattern Analysis
Reinforcement learning with Gaussian processes
ICML '05 Proceedings of the 22nd international conference on Machine learning
Natural actor-critic algorithms
Automatica (Journal of IFAC)
Efficient Uncertainty Propagation for Reinforcement Learning with Limited Data
ICANN '09 Proceedings of the 19th International Conference on Artificial Neural Networks: Part I
A Generalized Path Integral Control Approach to Reinforcement Learning
The Journal of Machine Learning Research
Hessian matrix distribution for Bayesian policy gradient reinforcement learning
Information Sciences: an International Journal
A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes
The Journal of Machine Learning Research
Reinforcement learning and the Bayesian control rule
AGI'11 Proceedings of the 4th international conference on Artificial general intelligence
Monte-Carlo tree search for Bayesian reinforcement learning
Applied Intelligence
Hi-index | 0.00 |
We present a new actor-critic learning model in which a Bayesian class of non-parametric critics, using Gaussian process temporal difference learning is used. Such critics model the state-action value function as a Gaussian process, allowing Bayes' rule to be used in computing the posterior distribution over state-action value functions, conditioned on the observed data. Appropriate choices of the prior covariance (kernel) between state-action values and of the parametrization of the policy allow us to obtain closed-form expressions for the posterior distribution of the gradient of the average discounted return with respect to the policy parameters. The posterior mean, which serves as our estimate of the policy gradient, is used to update the policy, while the posterior covariance allows us to gauge the reliability of the update.