Stochastic Approximation for Nonexpansive Maps: Application to Q-Learning Algorithms

Authors:
Jinane Abounadi;Dimitri P. Bertsekas;Vivek Borkar
Affiliations:
-;-;-
Venue:
SIAM Journal on Control and Optimization
Year:
2002

Citing 0
Cited 7

A Geometric Approach to Multi-Criterion Reinforcement Learning

The Journal of Machine Learning Research
An Optimal Approximate Dynamic Programming Algorithm for the Lagged Asset Acquisition Problem

Mathematics of Operations Research
Reinforcement Learning: A Tutorial Survey and Recent Advances

INFORMS Journal on Computing
Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming

Mathematics of Operations Research
Stochastic approximation with long range dependent and heavy tailed noise

Queueing Systems: Theory and Applications
Design with shape grammars and reinforcement learning

Advanced Engineering Informatics
On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems

Mathematics of Operations Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

We discuss synchronous and asynchronous iterations of the form xk+1 = x^k + \gamma(k) (h(x^k)+w^k), where h is a suitable map and {wk} is a deterministic or stochastic sequence satisfying suitable conditions. In particular, in the stochastic case, these are stochastic approximation iterations that can be analyzed using the ODE approach based either on Kushner and Clark's lemma for the synchronous case or on Borkar's theorem for the asynchronous case. However, the analysis requires that the iterates {xk} be bounded, a fact which is usually hard to prove. We develop a novel framework for proving boundedness in the deterministic framework, which is also applicable to the stochastic case when the deterministic hypotheses can be verified in the almost sure sense. This is based on scaling ideas and on the properties of Lyapunov functions. We then combine the boundedness property with Borkar's stability analysis of ODEs involving nonexpansive mappings to prove convergence (with probability 1 in the stochastic case). We also apply our convergence analysis to Q-learning algorithms for stochastic shortest path problems and are able to relax some of the assumptions of the currently available results.