Kernel rewards regression: an information efficient batch policy iteration approach

  • Authors:
  • Daniel Schneegaß;Steffen Udluft;Thomas Martinetz

  • Affiliations:
  • Information & Communications, Learning Systems, Siemens AG, Corporate Technology, Munich, Germany;Information & Communications, Learning Systems, Siemens AG, Corporate Technology, Munich, Germany;Institute for Neuro-and Bioinformatics, University at Luebeck, Luebeck, Germany

  • Venue:
  • AIA'06 Proceedings of the 24th IASTED international conference on Artificial intelligence and applications
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present the novel Kernel Rewards Regression (KRR) method for Policy Iteration in Reinforcement Learning on continuous state domains. Our method is able to obtain very useful policies observing just a few state action transitions. It considers the Reinforcement Learning problem as a regression task for which any appropriate technique may be applied. The use of kernel methods, e.g. the Support Vector Machine, enables the user to incorporate different types of structural prior knowledge about the state space by redefining the inner product. Furthermore KRR is a completely Off-policy method. The observations may be constructed by any sufficiently exploring policy, even the fully random one. We tested the algorithm on three typical Reinforcement Learning benchmarks. Moreover we give a proof for the correctness of our model and an error bound for estimating the Q-functions.