Reordering Sparsification of Kernel Machines in Approximate Policy Iteration

Authors:
Chunming Liu;Jinze Song;Xin Xu;Pengcheng Zhang
Affiliations:
Institute of Automation, National University of Defense Technology, Changsha, China 410073;Institute of Automation, National University of Defense Technology, Changsha, China 410073;Institute of Automation, National University of Defense Technology, Changsha, China 410073;Institute of Automation, National University of Defense Technology, Changsha, China 410073
Venue:
ISNN 2009 Proceedings of the 6th International Symposium on Neural Networks: Advances in Neural Networks - Part II
Year:
2009

Citing 10
Cited 0

Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Technical Update: Least-Squares Temporal Difference Learning

Machine Learning
Least-squares policy iteration

The Journal of Machine Learning Research
Efficient reinforcement learning using recursive least-squares methods

Journal of Artificial Intelligence Research
Reinforcement learning: a survey

Journal of Artificial Intelligence Research
Infinite-horizon policy-gradient estimation

Journal of Artificial Intelligence Research
The kernel recursive least-squares algorithm

IEEE Transactions on Signal Processing
Learning to trade via direct reinforcement

IEEE Transactions on Neural Networks
Kernel-Based Least Squares Policy Iteration for Reinforcement Learning

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

Approximate policy iteration (API), which includes least-squares policy iteration (LSPI) and its kernelized version (KLSPI), has received increasing attention due to their good convergence and generalization abilities in solving difficult reinforcement learning problems. However, the sparsification of feature vectors, especially the kernel-based features, greatly influences the performance of API methods. In this paper, a novel reordering sparsification method is proposed for sparsifiying kernel machines in API. In this method, a greedy strategy is adopted, which adds the sample with the maximal squared approximation error to the kernel dictionary, so that the samples are reordered to improve the performance of kernel sparsification. Experimental results on the learning control of an inverted pendulum verify that by using the proposed algorithm, the size of the kernel dictionary is smaller than that of the previous sequential sparsification algorithm with the same level of sparsity, and the performance of the control policies learned by KLSPI can also be improved.