Kernelized value function approximation for reinforcement learning

Authors:
Gavin Taylor;Ronald Parr
Affiliations:
Duke University, Durham, NC;Duke University, Durham, NC
Venue:
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Year:
2009

Citing 6
Cited 3

Sparse Online Greedy Support Vector Regression

ECML '02 Proceedings of the 13th European Conference on Machine Learning
Least-Squares Temporal Difference Learning

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Reinforcement learning with Gaussian processes

ICML '05 Proceedings of the 22nd international conference on Machine learning
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning

Proceedings of the 25th international conference on Machine learning
Kernel-Based Least Squares Policy Iteration for Reinforcement Learning

IEEE Transactions on Neural Networks

Sparse Kernel-SARSA(λ) with an eligibility trace

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
ℓ1-Penalized projected bellman residual

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Value function approximation through sparse bayesian modeling

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

A recent surge in research in kernelized approaches to reinforcement learning has sought to bring the benefits of kernelized machine learning techniques to reinforcement learning. Kernelized reinforcement learning techniques are fairly new and different authors have approached the topic with different assumptions and goals. Neither a unifying view nor an understanding of the pros and cons of different approaches has yet emerged. In this paper, we offer a unifying view of the different approaches to kernelized value function approximation for reinforcement learning. We show that, except for different approaches to regularization, Kernelized LSTD (KLSTD) is equivalent to a modelbased approach that uses kernelized regression to find an approximate reward and transition model, and that Gaussian Process Temporal Difference learning (GPTD) returns a mean value function that is equivalent to these other approaches. We also discuss the relationship between our modelbased approach and the earlier Gaussian Processes in Reinforcement Learning (GPRL). Finally, we decompose the Bellman error into the sum of transition error and reward error terms, and demonstrate through experiments that this decomposition can be helpful in choosing regularization parameters.