Adaptive kernel-width selection for kernel-based least-squares policy iteration algorithm

Authors:
Jun Wu;Xin Xu;Lei Zuo;Zhaobin Li;Jian Wang
Affiliations:
Institute of Automation, National University of Defense Technology, Changsha, P.R. China;Institute of Automation, National University of Defense Technology, Changsha, P.R. China;Institute of Automation, National University of Defense Technology, Changsha, P.R. China;Institute of Automation, National University of Defense Technology, Changsha, P.R. China;Institute of Automation, National University of Defense Technology, Changsha, P.R. China
Venue:
ISNN'11 Proceedings of the 8th international conference on Advances in neural networks - Volume Part II
Year:
2011

Citing 7
Cited 0

Neural Networks: A Comprehensive Foundation

Neural Networks: A Comprehensive Foundation
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Least-squares policy iteration

The Journal of Machine Learning Research
A differential evolution based incremental training method for RBF networks

GECCO '05 Proceedings of the 7th annual conference on Genetic and evolutionary computation
Fast learning in networks of locally-tuned processing units

Neural Computation
Kernel Width Optimization for Faulty RBF Neural Networks with Multi-node Open Fault

Neural Processing Letters
Kernel-Based Least Squares Policy Iteration for Reinforcement Learning

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Kernel-based Least-squares Policy Iteration (KLSPI) algorithm provides a general reinforcement learning solution for large-scale Markov decision problems. In KLSPI, the Radial Basis Function (RBF) kernel is usually used to approximate the optimal value-function with high precision. However, selecting a proper kernel-width for the RBF kernel function is very important for KLSPI to be adopted successfully. In previous research, the kernel-width was usually set manually or calculated according to the sample distribution in advance, which requires prior knowledge or model information. In this paper, an adaptive kernel-width selection method is proposed for the KLSPI algorithm. Firstly, a sparsification procedure with neighborhood analysis based on the l2- ball of radius ε is adopted, which helps obtain a reduced kernel dictionary without presetting the kernel-width. Secondly, a gradient descent method based on the Bellman Residual Error (BRE) is proposed so as to find out a kernelwidth minimizing the sum of the BRE. The experimental results show the proposed method can help KLSPI approximate the true value-function more accurately, and, finally, obtain a better control policy.