Learning Rates for Q-learning

Authors:
Eyal Even-Dar;Yishay Mansour
Affiliations:
-;-
Venue:
The Journal of Machine Learning Research
Year:
2004

Citing 7
Cited 23

Technical Note: \cal Q-Learning

Machine Learning
The asymptotic convergence-rate of Q-learning

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
The O.D. E. Method for Convergence of Stochastic Approximation and Reinforcement Learning

SIAM Journal on Control and Optimization
Finite-sample convergence rates for Q-learning and indirect algorithms

Proceedings of the 1998 conference on Advances in neural information processing systems II
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming

Interpolation-based Q-learning

ICML '04 Proceedings of the twenty-first international conference on Machine learning
PAC model-free reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming

Machine Learning
Towards the Automatic Learning of Reflex Modulation for Mobile Robot Navigation

IWINAC '07 Proceedings of the 2nd international work-conference on Nature Inspired Problem-Solving Methods in Knowledge Engineering: Interplay Between Natural and Artificial Computation, Part II
Value Function Based Reinforcement Learning in Changing Markovian Environments

The Journal of Machine Learning Research
On step sizes, stochastic shortest paths, and survival probabilities in reinforcement learning

Proceedings of the 40th Conference on Winter Simulation
An Optimal Approximate Dynamic Programming Algorithm for the Lagged Asset Acquisition Problem

Mathematics of Operations Research
Reinforcement Learning: A Tutorial Survey and Recent Advances

INFORMS Journal on Computing
Adaptive stochastic resource control: a machine learning approach

Journal of Artificial Intelligence Research
Recursive Adaptation of Stepsize Parameter for Non-stationary Environments

PRIMA '09 Proceedings of the 12th International Conference on Principles of Practice in Multi-Agent Systems
Reinforcement Learning in Finite MDPs: PAC Analysis

The Journal of Machine Learning Research
Automated bidding in computational markets: an application in market-based allocation of computing services

Autonomous Agents and Multi-Agent Systems
Learning hybridization strategies in evolutionary algorithms

Intelligent Data Analysis
Coordinated learning in multiagent MDPs with infinite state-space

Autonomous Agents and Multi-Agent Systems
Multi-goal Q-learning of cooperative teams

Expert Systems with Applications: An International Journal
The evolution of rules for conflicts resolution in self-organizing teams

Expert Systems with Applications: An International Journal
Towards finite-sample convergence of direct reinforcement learning

ECML'05 Proceedings of the 16th European conference on Machine Learning
Recursive adaptation of stepsize parameter for non-stationary environments

ALA'09 Proceedings of the Second international conference on Adaptive and Learning Agents
Adaption of stepsize parameter using newton's method

PRIMA'11 Proceedings of the 14th international conference on Agents in Principle, Agents in Practice
Reputation-Aware learning for SLA negotiation

IFIP'12 Proceedings of the 2012 international conference on Networking
Designing of on line intrusion detection system using rough set theory and Q-learning algorithm

Neurocomputing
Dynamic policy programming

The Journal of Machine Learning Research
A multi-agent control architecture for a robotic wheelchair

Applied Bionics and Biomechanics

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we derive convergence rates for Q-learning. We show an interesting relationship between the convergence rate and the learning rate used in Q-learning. For a polynomial learning rate, one which is 1/tω at time t where ω∈(1/2,1), we show that the convergence rate is polynomial in 1/(1-γ), where γ is the discount factor. In contrast we show that for a linear learning rate, one which is 1/t at time t, the convergence rate has an exponential dependence on 1/(1-γ). In addition we show a simple example that proves this exponential behavior is inherent for linear learning rates.