Least Squares SVM for Least Squares TD Learning

Authors:
Tobias Jung;Daniel Polani
Affiliations:
University of Mainz, Germany, email: tjung@informatik.uni-mainz.de;University of Hertfordshire, UK, email: d.polani@herts.ac.uk
Venue:
Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
Year:
2006

Citing 7
Cited 6

The Parti-game Algorithm for Variable Resolution Reinforcement Learning in Multidimensional State-spaces

Machine Learning
Linear least-squares algorithms for temporal difference learning

Machine Learning - Special issue on reinforcement learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Sparse Greedy Matrix Approximation for Machine Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Efficient svm training using low-rank kernel representations

The Journal of Machine Learning Research
Least-squares policy iteration

The Journal of Machine Learning Research
The kernel recursive least-squares algorithm

IEEE Transactions on Signal Processing

Online kernel selection for Bayesian reinforcement learning

Proceedings of the 25th international conference on Machine learning
Regularized Fitted Q-Iteration: Application to Planning

Recent Advances in Reinforcement Learning
Regularization and feature selection in least-squares temporal difference learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Regularized fitted Q-iteration for planning in continuous-space Markovian decision problems

ACC'09 Proceedings of the 2009 conference on American Control Conference
Model-based and model-free reinforcement learning for visual servoing

ICRA'09 Proceedings of the 2009 IEEE international conference on Robotics and Automation
An efficient L2-norm regularized least-squares temporal difference learning algorithm

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We formulate the problem of least squares temporal difference learning (LSTD) in the framework of least squares SVM (LS-SVM). To cope with the large amount (and possible sequential nature) of training data arising in reinforcement learning we employ a subspace based variant of LS-SVM that sequentially processes the data and is hence especially suited for online learning. This approach is adapted from the context of Gaussian process regression and turns the unwieldy original optimization problem (with computational complexity being cubic in the number of processed data) into a reduced problem (with computional complexity being linear in the number of processed data). We introduce a QR decomposition based approach to solve the resulting generalized normal equations incrementally that is numerically more stable than existing recursive least squares based update algorithms. We also allow a forgetting factor in the updates to track non-stationary target functions (i.e. for the use with optimistic policy iteration). Experimental comparison with standard CMAC function approximation indicate that LS-SVMs are well-suited for online RL.