Feature-based methods for large scale dynamic programming
Machine Learning - Special issue on reinforcement learning
Reinforcement learning with replacing eligibility traces
Machine Learning - Special issue on reinforcement learning
A unified analysis of value-function-based reinforcement learning algorithms
Neural Computation
Kernel-Based Reinforcement Learning
Machine Learning
Variable Resolution Discretization for High-Accuracy Solutions of Optimal Control Problems
IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
The Journal of Machine Learning Research
Finite time bounds for sampling based fitted value iteration
ICML '05 Proceedings of the 22nd international conference on Machine learning
Model-based function approximation in reinforcement learning
Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
An analysis of reinforcement learning with function approximation
Proceedings of the 25th international conference on Machine learning
Finite-Time Bounds for Fitted Value Iteration
The Journal of Machine Learning Research
Emerging coordination in infinite team Markov games
Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 1
A New Learning Algorithm for Optimal Stopping
Discrete Event Dynamic Systems
Q-learning with linear function approximation
COLT'07 Proceedings of the 20th annual conference on Learning theory
Approximate dynamic programming with a fuzzy parameterization
Automatica (Journal of IFAC)
Coordinated learning in multiagent MDPs with infinite state-space
Autonomous Agents and Multi-Agent Systems
Continuous-state reinforcement learning with fuzzy approximation
ALAMAS'05/ALAMAS'06/ALAMAS'07 Proceedings of the 5th , 6th and 7th European conference on Adaptive and learning agents and multi-agent systems: adaptation and multi-agent learning
The Journal of Machine Learning Research
Adaptive function approximation in reinforcement learning with an interpolating growing neural gas
International Journal of Hybrid Intelligent Systems
Hi-index | 0.00 |
We consider a variant of Q-learning in continuous state spaces under the total expected discounted cost criterion combined with local function approximation methods. Provided that the function approximator satisfies certain interpolation properties, the resulting algorithm is shown to converge with probability one. The limit function is shown to satisfy a fixed point equation of the Bellman type, where the fixed point operator depends on the stationary distribution of the exploration policy and the function approximation method. The basic algorithm is extended in several ways. In particular, a variant of the algorithm is obtained that is shown to converge in probability to the optimal Q function. Preliminary computer simulations are presented that confirm the validity of the approach.