Kernel rewards regression: an information efficient batch policy iteration approach

Authors:
Daniel Schneegaß;Steffen Udluft;Thomas Martinetz
Affiliations:
Information & Communications, Learning Systems, Siemens AG, Corporate Technology, Munich, Germany;Information & Communications, Learning Systems, Siemens AG, Corporate Technology, Munich, Germany;Institute for Neuro-and Bioinformatics, University at Luebeck, Luebeck, Germany
Venue:
AIA'06 Proceedings of the 24th IASTED international conference on Artificial intelligence and applications
Year:
2006

Citing 13
Cited 2

The nature of statistical learning theory

The nature of statistical learning theory
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
Prior knowledge in support vector kernels

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Scaling Kernel-Based Systems to Large Data Sets

Data Mining and Knowledge Discovery
Kernel-Based Reinforcement Learning

Machine Learning
Recent Advances in Hierarchical Reinforcement Learning

Discrete Event Dynamic Systems
Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Least-Squares Temporal Difference Learning

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Least-Squares Methods in Reinforcement Learning for Control

SETN '02 Proceedings of the Second Hellenic Conference on AI: Methods and Applications of Artificial Intelligence
Least-squares policy iteration

The Journal of Machine Learning Research
Learning with non-positive kernels

ICML '04 Proceedings of the twenty-first international conference on Machine learning

Bayesian Reward Filtering

Recent Advances in Reinforcement Learning
Improving optimality of neural rewards regression for data-efficient batch near-optimal policy identification

ICANN'07 Proceedings of the 17th international conference on Artificial neural networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present the novel Kernel Rewards Regression (KRR) method for Policy Iteration in Reinforcement Learning on continuous state domains. Our method is able to obtain very useful policies observing just a few state action transitions. It considers the Reinforcement Learning problem as a regression task for which any appropriate technique may be applied. The use of kernel methods, e.g. the Support Vector Machine, enables the user to incorporate different types of structural prior knowledge about the state space by redefining the inner product. Furthermore KRR is a completely Off-policy method. The observations may be constructed by any sufficiently exploring policy, even the fully random one. We tested the algorithm on three typical Reinforcement Learning benchmarks. Moreover we give a proof for the correctness of our model and an error bound for estimating the Q-functions.