On the Convergence of Stochastic Iterative Dynamic Programming Algorithms

Authors:
Tommi Jaakkola;Michael I. Jordan;Satinder P. Singh
Affiliations:
-;-;-
Venue:
On the Convergence of Stochastic Iterative Dynamic Programming Algorithms
Year:
1993

Citing 0
Cited 1

Scaling ant colony optimization with hierarchical reinforcement learning partitioning

Proceedings of the 10th annual conference on Genetic and evolutionary computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms, including the TD(lambda) algorithm of Sutton (1988) and the Q-learning algorithm of Watkins (1989), can be motivated heuristically as approximations to dynamic programming (DP). In this paper we provide a rigorous proof of convergence of these DP-based learning algorithms by relating them to the powerful techniques of stochastic approximation theory via a new convergence theorem. The theorem establishes a general class of convergent algorithms to which both TD(lambda) and Q-learning belong.