Technical Note: \cal Q-Learning
Machine Learning
Asynchronous Stochastic Approximation and Q-Learning
Machine Learning
The asymptotic convergence-rate of Q-learning
NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
The O.D. E. Method for Convergence of Stochastic Approximation and Reinforcement Learning
SIAM Journal on Control and Optimization
Finite-sample convergence rates for Q-learning and indirect algorithms
Proceedings of the 1998 conference on Advances in neural information processing systems II
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Neuro-Dynamic Programming
On the convergence of stochastic iterative dynamic programming algorithms
Neural Computation
PAC Bounds for Multi-armed Bandit and Markov Decision Processes
COLT '02 Proceedings of the 15th Annual Conference on Computational Learning Theory
Testing probabilistic equivalence through reinforcement learning
FSTTCS'06 Proceedings of the 26th international conference on Foundations of Software Technology and Theoretical Computer Science
Trace equivalence characterization through reinforcement learning
AI'06 Proceedings of the 19th international conference on Advances in Artificial Intelligence: Canadian Society for Computational Studies of Intelligence
Hi-index | 0.00 |
In this paper we derive convergence rates for Q-learning. We show an interesting relationship between the convergence rate and the learning rate used in the Q-learning. For a polynomial learning rate, one which is 1/tω at time t where ω ∈ (1/2, 1), we show that that the convergence rate is polynomial in 1/(1 - γ), where γ is the discount factor. In contrast we show that for a linear learning rate, one which is 1/t at time t, the convergence rate has an exponential dependence on 1/(1 - γ). In addition we show a simple example that proves that this exponential behavior is inherent for a linear learning rate.