Reinforcement learning with Gaussian processes

Authors:
Yaakov Engel;Shie Mannor;Ron Meir
Affiliations:
University of Alberta, Edmonton, Canada;McGill University, Montreal, Canada;Technion Institute of Technology, Haifa, Israel
Venue:
ICML '05 Proceedings of the 22nd international conference on Machine learning
Year:
2005

Citing 5
Cited 32

Learning in embedded systems

Learning in embedded systems
Bayesian Q-learning

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Reinforcement Learning

Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Bias and variance in value function estimation

ICML '04 Proceedings of the twenty-first international conference on Machine learning

Bayesian actor-critic algorithms

Proceedings of the 24th international conference on Machine learning
Online kernel selection for Bayesian reinforcement learning

Proceedings of the 25th international conference on Machine learning
Geodesic Gaussian kernels for value function approximation

Autonomous Robots
Regularized Fitted Q-Iteration: Application to Planning

Recent Advances in Reinforcement Learning
Gaussian process dynamic programming

Neurocomputing
Kernelized value function approximation for reinforcement learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Online exploration in least-squares policy iteration

Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Feature Selection for Value Function Approximation Using Bayesian Model Selection

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Efficient Uncertainty Propagation for Reinforcement Learning with Limited Data

ICANN '09 Proceedings of the 19th International Conference on Artificial Neural Networks: Part I
Regularized fitted Q-iteration for planning in continuous-space Markovian decision problems

ACC'09 Proceedings of the 2009 conference on American Control Conference
Adaptive autonomous control using online value iteration with Gaussian processes

ICRA'09 Proceedings of the 2009 IEEE international conference on Robotics and Automation
Model-based and model-free reinforcement learning for visual servoing

ICRA'09 Proceedings of the 2009 IEEE international conference on Robotics and Automation
Incorporating domain models into Bayesian optimization for RL

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
Reducing reinforcement learning to KWIK online regression

Annals of Mathematics and Artificial Intelligence
Gaussian processes for fast policy optimisation of POMDP-based dialogue managers

SIGDIAL '10 Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Solving non-stationary bandit problems by random sampling from sibling Kalman filters

IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part III
Kalman temporal differences

Journal of Artificial Intelligence Research
Hessian matrix distribution for Bayesian policy gradient reinforcement learning

Information Sciences: an International Journal
Natural actor and belief critic: Reinforcement algorithm for learning parameters of dialogue systems modelled as POMDPs

ACM Transactions on Speech and Language Processing (TSLP)
A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes

The Journal of Machine Learning Research
Improving Gaussian process value function approximation in policy gradient algorithms

ICANN'11 Proceedings of the 21st international conference on Artificial neural networks - Volume Part II
Sparse Kernel-SARSA(λ) with an eligibility trace

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
Models for autonomously motivated exploration in reinforcement learning

ALT'11 Proceedings of the 22nd international conference on Algorithmic learning theory
A competitive strategy for function approximation in Q-learning

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Gradient based algorithms with loss functions and kernels for improved on-policy control

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Value function approximation through sparse bayesian modeling

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
An online kernel-based clustering approach for value function approximation

SETN'12 Proceedings of the 7th Hellenic conference on Artificial Intelligence: theories and applications
Online learning with multiple kernels: A review

Neural Computation
An efficient L2-norm regularized least-squares temporal difference learning algorithm

Knowledge-Based Systems
Linear Bayesian reinforcement learning

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Monte-Carlo tree search for Bayesian reinforcement learning

Applied Intelligence
Gaussian Processes for POMDP-Based Dialogue Manager Optimization

IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Gaussian Process Temporal Difference (GPTD) learning offers a Bayesian solution to the policy evaluation problem of reinforcement learning. In this paper we extend the GPTD framework by addressing two pressing issues, which were not adequately treated in the original GPTD paper (Engel et al., 2003). The first is the issue of stochasticity in the state transitions, and the second is concerned with action selection and policy improvement. We present a new generative model for the value function, deduced from its relation with the discounted return. We derive a corresponding on-line algorithm for learning the posterior moments of the value Gaussian process. We also present a SARSA based extension of GPTD, termed GPSARSA, that allows the selection of actions and the gradual improvement of policies without requiring a world-model.