An analysis of reinforcement learning with function approximation

Authors:
Francisco S. Melo;Sean P. Meyn;M. Isabel Ribeiro
Affiliations:
Carnegie Mellon University, Pittsburgh, PA;Coordinated Science Lab, Urbana, IL;Institute for Systems and Robotics, Lisboa, Portugal
Venue:
Proceedings of the 25th international conference on Machine learning
Year:
2008

Citing 12
Cited 7

Adaptive algorithms and stochastic approximations

Adaptive algorithms and stochastic approximations
Feature-based methods for large scale dynamic programming

Machine Learning - Special issue on reinforcement learning
Stochastic approximation with two time scales

Systems & Control Letters
On the existence of fixed points for approximate value iteration and temporal-difference learning

Journal of Optimization Theory and Applications
On the Convergence of Temporal-Difference Learning with Linear Function Approximation

Machine Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Kernel-Based Reinforcement Learning

Machine Learning
Off-Policy Temporal Difference Learning with Function Approximation

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
On the Existence of Fixed Points for Q-Learning and Sarsa in Partially Observable Domains

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Stable Function Approximation in Dynamic Programming

Stable Function Approximation in Dynamic Programming
A LEARNING ALGORITHM FOR DISCRETE-TIME STOCHASTIC CONTROL

Probability in the Engineering and Informational Sciences
Interpolation-based Q-learning

ICML '04 Proceedings of the twenty-first international conference on Machine learning

Model-free reinforcement learning as mixture learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Coordinated learning in multiagent MDPs with infinite state-space

Autonomous Agents and Multi-Agent Systems
A convergent multiagent reinforcement learning approach for a subclass of cooperative stochastic games

ALA'11 Proceedings of the 11th international conference on Adaptive and Learning Agents
Sparse gradient-based direct policy search

ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part IV
Dynamic policy programming

The Journal of Machine Learning Research
Intelligent Cooperative Control Architecture: A Framework for Performance Improvement Using Safe Learning

Journal of Intelligent and Robotic Systems
Policy oscillation is overshooting

Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

We address the problem of computing the optimal Q-function in Markov decision problems with infinite state-space. We analyze the convergence properties of several variations of Q-learning when combined with function approximation, extending the analysis of TD-learning in (Tsitsiklis & Van Roy, 1996a) to stochastic control settings. We identify conditions under which such approximate methods converge with probability 1. We conclude with a brief discussion on the general applicability of our results and compare them with several related works.