Nonlinear component analysis as a kernel eigenvalue problem
Neural Computation
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Technical Update: Least-Squares Temporal Difference Learning
Machine Learning
Learning to Predict by the Methods of Temporal Differences
Machine Learning
Least-squares policy iteration
The Journal of Machine Learning Research
Reinforcement learning with Gaussian processes
ICML '05 Proceedings of the 22nd international conference on Machine learning
The Journal of Machine Learning Research
Samuel meets Amarel: automating value function approximation using global state space analysis
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Reinforcement learning: a survey
Journal of Artificial Intelligence Research
An analysis of Laplacian methods for value function approximation in MDPs
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
The global kernel k-means algorithm for clustering in feature space
IEEE Transactions on Neural Networks
Kernel-Based Least Squares Policy Iteration for Reinforcement Learning
IEEE Transactions on Neural Networks
Hi-index | 0.00 |
Value function approximation is a critical task in solving Markov decision processes and accurately modeling reinforcement learning agents. A significant issue is how to construct efficient feature spaces from samples collected by the environment in order to obtain an optimal policy. The particular study addresses this challenge by proposing an on-line kernel-based clustering approach for building appropriate basis functions during the learning process. The method uses a kernel function capable of handling pairs of state-action as sequentially generated by the agent. At each time step, the procedure either adds a new cluster, or adjusts the winning cluster's parameters. By considering the value function as a linear combination of the constructed basis functions, the weights are optimized in a temporal-difference framework in order to minimize the Bellman approximation error. The proposed method is evaluated in numerous known simulated environments.