Convergence of reinforcement learning with general function approximators

Authors:
Vassilis A. Papavassiliou;Stuart Russell
Affiliations:
Computer Science Division, U. of California, Berkeley, CA;Computer Science Division, U. of California, Berkeley, CA
Venue:
IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
Year:
1999

Citing 4
Cited 0

Decision theoretic generalizations of the PAC model for neural net and other learning applications

Information and Computation
On efficient agnostic learning of linear combinations of basis functions

COLT '95 Proceedings of the eighth annual conference on Computational learning theory
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Learning to Predict by the Methods of Temporal Differences

Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

A key open problem in reinforcement learning is to assure convergence when using a compact hypothesis class to approximate the value function. Although the standard temporal-difference learning algorithm has been shown to converge when the hypothesis class is a linear combination of fixed basis functions, it may diverge with a general (non-linear) hypothesis class. This paper describes the Bridge algorithm, a new method for reinforcement learning, and shows that it converges to an approximate global optimum for any agnostically learnable hypothesis class. Convergence is demonstrated on a simple example for which temporal-difference learning fails. Weak conditions are identified under which the Bridge algorithm converges for any hypothesis class. Finally, connections are made between the complexity of reinforcement learning and the PAC-learnability of the hypothesis class.