Adaptive kernel-width selection for kernel-based least-squares policy iteration algorithm

  • Authors:
  • Jun Wu;Xin Xu;Lei Zuo;Zhaobin Li;Jian Wang

  • Affiliations:
  • Institute of Automation, National University of Defense Technology, Changsha, P.R. China;Institute of Automation, National University of Defense Technology, Changsha, P.R. China;Institute of Automation, National University of Defense Technology, Changsha, P.R. China;Institute of Automation, National University of Defense Technology, Changsha, P.R. China;Institute of Automation, National University of Defense Technology, Changsha, P.R. China

  • Venue:
  • ISNN'11 Proceedings of the 8th international conference on Advances in neural networks - Volume Part II
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Kernel-based Least-squares Policy Iteration (KLSPI) algorithm provides a general reinforcement learning solution for large-scale Markov decision problems. In KLSPI, the Radial Basis Function (RBF) kernel is usually used to approximate the optimal value-function with high precision. However, selecting a proper kernel-width for the RBF kernel function is very important for KLSPI to be adopted successfully. In previous research, the kernel-width was usually set manually or calculated according to the sample distribution in advance, which requires prior knowledge or model information. In this paper, an adaptive kernel-width selection method is proposed for the KLSPI algorithm. Firstly, a sparsification procedure with neighborhood analysis based on the l2- ball of radius ε is adopted, which helps obtain a reduced kernel dictionary without presetting the kernel-width. Secondly, a gradient descent method based on the Bellman Residual Error (BRE) is proposed so as to find out a kernelwidth minimizing the sum of the BRE. The experimental results show the proposed method can help KLSPI approximate the true value-function more accurately, and, finally, obtain a better control policy.