The nature of statistical learning theory
The nature of statistical learning theory
Fast training of support vector machines using sequential minimal optimization
Advances in kernel methods
Prior knowledge in support vector kernels
NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
An introduction to support Vector Machines: and other kernel-based learning methods
An introduction to support Vector Machines: and other kernel-based learning methods
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Scaling Kernel-Based Systems to Large Data Sets
Data Mining and Knowledge Discovery
Kernel-Based Reinforcement Learning
Machine Learning
Recent Advances in Hierarchical Reinforcement Learning
Discrete Event Dynamic Systems
Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Least-Squares Temporal Difference Learning
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Least-Squares Methods in Reinforcement Learning for Control
SETN '02 Proceedings of the Second Hellenic Conference on AI: Methods and Applications of Artificial Intelligence
Least-squares policy iteration
The Journal of Machine Learning Research
Learning with non-positive kernels
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Recent Advances in Reinforcement Learning
ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
Hi-index | 0.00 |
We present the novel Kernel Rewards Regression (KRR) method for Policy Iteration in Reinforcement Learning on continuous state domains. Our method is able to obtain very useful policies observing just a few state action transitions. It considers the Reinforcement Learning problem as a regression task for which any appropriate technique may be applied. The use of kernel methods, e.g. the Support Vector Machine, enables the user to incorporate different types of structural prior knowledge about the state space by redefining the inner product. Furthermore KRR is a completely Off-policy method. The observations may be constructed by any sufficiently exploring policy, even the fully random one. We tested the algorithm on three typical Reinforcement Learning benchmarks. Moreover we give a proof for the correctness of our model and an error bound for estimating the Q-functions.