On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems

Authors:
Huizhen Yu;Dimitri P. Bertsekas
Affiliations:
Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139;Laboratory for Information and Decision Systems and Department of EECS, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
Venue:
Mathematics of Operations Research
Year:
2013

Citing 5
Cited 0

An analysis of stochastic shortest path problems

Mathematics of Operations Research
Asynchronous Stochastic Approximation and Q-Learning

Machine Learning
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Stochastic Approximation for Nonexpansive Maps: Application to Q-Learning Algorithms

SIAM Journal on Control and Optimization

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider a totally asynchronous stochastic approximation algorithm, Q-learning, for solving finite space stochastic shortest path SSP problems, which are undiscounted, total cost Markov decision processes with an absorbing and cost-free state. For the most commonly used SSP models, existing convergence proofs assume that the sequence of Q-learning iterates is bounded with probability one, or some other condition that guarantees boundedness. We prove that the sequence of iterates is naturally bounded with probability one, thus furnishing the boundedness condition in the convergence proof by Tsitsiklis [Tsitsiklis JN 1994 Asynchronous stochastic approximation and Q-learning. Machine Learn. 16:185--202] and establishing completely the convergence of Q-learning for these SSP models.