Kernel-Based Least Squares Policy Iteration for Reinforcement Learning

Authors:
Xin Xu;Dewen Hu;Xicheng Lu
Affiliations:
Nat. Univ. of Defense Technol., Changsha;-;-
Venue:
IEEE Transactions on Neural Networks
Year:
2007

Citing 0
Cited 16

Regularized Fitted Q-Iteration: Application to Planning

Recent Advances in Reinforcement Learning
Regularization and feature selection in least-squares temporal difference learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Kernelized value function approximation for reinforcement learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Reordering Sparsification of Kernel Machines in Approximate Policy Iteration

ISNN 2009 Proceedings of the 6th International Symposium on Neural Networks: Advances in Neural Networks - Part II
Reinforcement Learning Control of a Real Mobile Robot Using Approximate Policy Iteration

ISNN 2009 Proceedings of the 6th International Symposium on Neural Networks: Advances in Neural Networks - Part III
Sparse approximate dynamic programming for dialog management

SIGDIAL '10 Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Sample-efficient batch reinforcement learning for dialogue management optimization

ACM Transactions on Speech and Language Processing (TSLP)
Adaptive kernel-width selection for kernel-based least-squares policy iteration algorithm

ISNN'11 Proceedings of the 8th international conference on Advances in neural networks - Volume Part II
Sparse Kernel-SARSA(λ) with an eligibility trace

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
Reinforcement Programming

Computational Intelligence
Value function approximation through sparse bayesian modeling

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
A novel feature sparsification method for kernel-based approximate policy iteration

ISNN'12 Proceedings of the 9th international conference on Advances in Neural Networks - Volume Part I
A rapid sparsification method for kernel machines in approximate policy iteration

ISNN'12 Proceedings of the 9th international conference on Advances in Neural Networks - Volume Part I
An online kernel-based clustering approach for value function approximation

SETN'12 Proceedings of the 7th Hellenic conference on Artificial Intelligence: theories and applications
A hierarchical representation policy iteration algorithm for reinforcement learning

IScIDE'12 Proceedings of the third Sino-foreign-interchange conference on Intelligent Science and Intelligent Data Engineering
Reinforcement learning algorithms with function approximation: Recent advances and applications

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present a kernel-based least squares policy iteration (KLSPI) algorithm for reinforcement learning (RL) in large or continuous state spaces, which can be used to realize adaptive feedback control of uncertain dynamic systems. By using KLSPI, near-optimal control policies can be obtained without much a priori knowledge on dynamic models of control plants. In KLSPI, Mercer kernels are used in the policy evaluation of a policy iteration process, where a new kernel-based least squares temporal-difference algorithm called KLSTD-Q is proposed for efficient policy evaluation. To keep the sparsity and improve the generalization ability of KLSTD-Q solutions, a kernel sparsification procedure based on approximate linear dependency (ALD) is performed. Compared to the previous works on approximate RL methods, KLSPI makes two progresses to eliminate the main difficulties of existing results. One is the better convergence and (near) optimality guarantee by using the KLSTD-Q algorithm for policy evaluation with high precision. The other is the automatic feature selection using the ALD-based kernel sparsification. Therefore, the KLSPI algorithm provides a general RL method with generalization performance and convergence guarantee for large-scale Markov decision problems (MDPs). Experimental results on a typical RL task for a stochastic chain problem demonstrate that KLSPI can consistently achieve better learning efficiency and policy quality than the previous least squares policy iteration (LSPI) algorithm. Furthermore, the KLSPI method was also evaluated on two nonlinear feedback control problems, including a ship heading control problem and the swing up control of a double-link underactuated pendulum called acrobot. Simulation results illustrate that the proposed method can optimize controller performance using little a priori information of uncertain dynamic systems. It is also demonstrated that KLSPI can be applied to online learning control by incorporating a- - n initial controller to ensure online performance.