Learning Rates for Q-Learning

Authors:
Eyal Even-Dar;Yishay Mansour
Affiliations:
-;-
Venue:
COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
Year:
2001

Citing 9
Cited 3

Technical Note: \cal Q-Learning

Machine Learning
Asynchronous Stochastic Approximation and Q-Learning

Machine Learning
The asymptotic convergence-rate of Q-learning

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
The O.D. E. Method for Convergence of Stochastic Approximation and Reinforcement Learning

SIAM Journal on Control and Optimization
Finite-sample convergence rates for Q-learning and indirect algorithms

Proceedings of the 1998 conference on Advances in neural information processing systems II
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
On the convergence of stochastic iterative dynamic programming algorithms

Neural Computation

PAC Bounds for Multi-armed Bandit and Markov Decision Processes

COLT '02 Proceedings of the 15th Annual Conference on Computational Learning Theory
Testing probabilistic equivalence through reinforcement learning

FSTTCS'06 Proceedings of the 26th international conference on Foundations of Software Technology and Theoretical Computer Science
Trace equivalence characterization through reinforcement learning

AI'06 Proceedings of the 19th international conference on Advances in Artificial Intelligence: Canadian Society for Computational Studies of Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we derive convergence rates for Q-learning. We show an interesting relationship between the convergence rate and the learning rate used in the Q-learning. For a polynomial learning rate, one which is 1/tω at time t where ω ∈ (1/2, 1), we show that that the convergence rate is polynomial in 1/(1 - γ), where γ is the discount factor. In contrast we show that for a linear learning rate, one which is 1/t at time t, the convergence rate has an exponential dependence on 1/(1 - γ). In addition we show a simple example that proves that this exponential behavior is inherent for a linear learning rate.