Adaptive algorithms and stochastic approximations
Adaptive algorithms and stochastic approximations
Feature-based methods for large scale dynamic programming
Machine Learning - Special issue on reinforcement learning
Stochastic approximation with two time scales
Systems & Control Letters
On the existence of fixed points for approximate value iteration and temporal-difference learning
Journal of Optimization Theory and Applications
Neuro-Dynamic Programming
Kernel-Based Reinforcement Learning
Machine Learning
Off-Policy Temporal Difference Learning with Function Approximation
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
On the Existence of Fixed Points for Q-Learning and Sarsa in Partially Observable Domains
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Stable Function Approximation in Dynamic Programming
Stable Function Approximation in Dynamic Programming
A LEARNING ALGORITHM FOR DISCRETE-TIME STOCHASTIC CONTROL
Probability in the Engineering and Informational Sciences
Interpolation-based Q-learning
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Model-free reinforcement learning as mixture learning
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Coordinated learning in multiagent MDPs with infinite state-space
Autonomous Agents and Multi-Agent Systems
ALA'11 Proceedings of the 11th international conference on Adaptive and Learning Agents
Sparse gradient-based direct policy search
ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part IV
The Journal of Machine Learning Research
Journal of Intelligent and Robotic Systems
Policy oscillation is overshooting
Neural Networks
Hi-index | 0.00 |
We address the problem of computing the optimal Q-function in Markov decision problems with infinite state-space. We analyze the convergence properties of several variations of Q-learning when combined with function approximation, extending the analysis of TD-learning in (Tsitsiklis & Van Roy, 1996a) to stochastic control settings. We identify conditions under which such approximate methods converge with probability 1. We conclude with a brief discussion on the general applicability of our results and compare them with several related works.