Q-learning with linear function approximation

Authors:
Francisco S. Melo;M. Isabel Ribeiro
Affiliations:
Institute for Systems and Robotics, Instituto Superior Técnico, Lisboa, Portugal;Institute for Systems and Robotics, Instituto Superior Técnico, Lisboa, Portugal
Venue:
COLT'07 Proceedings of the 20th annual conference on Learning theory
Year:
2007

Citing 17
Cited 4

The complexity of Markov decision processes

Mathematics of Operations Research
Dyna, an integrated architecture for learning, planning, and reacting

ACM SIGART Bulletin
Technical Note: \cal Q-Learning

Machine Learning
TD-Gammon, a self-teaching backgammon program, achieves master-level play

Neural Computation
Feature-based methods for large scale dynamic programming

Machine Learning - Special issue on reinforcement learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Off-Policy Temporal Difference Learning with Function Approximation

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Least-Squares Temporal Difference Learning

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Stable Function Approximation in Dynamic Programming

Stable Function Approximation in Dynamic Programming
Learning to Act using Real-Time Dynamic Programming

Learning to Act using Real-Time Dynamic Programming
Exact and approximate algorithms for partially observable markov decision processes

Exact and approximate algorithms for partially observable markov decision processes
A LEARNING ALGORITHM FOR DISCRETE-TIME STOCHASTIC CONTROL

Probability in the Engineering and Informational Sciences
Interpolation-based Q-learning

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Automatic basis function construction for approximate dynamic programming and reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Nonapproximability results for partially observable Markov decision processes

Journal of Artificial Intelligence Research
Approximating optimal policies for partially observable stochastic domains

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2

A neurocomputational model for cocaine addiction

Neural Computation
A task annotation model for sandbox Serious Games

CIG'09 Proceedings of the 5th international conference on Computational Intelligence and Games
Q-learning Reward Propagation Method for Reducing the Transmission Power of Sensor Nodes in Wireless Sensor Networks

Wireless Personal Communications: An International Journal
Adaptive function approximation in reinforcement learning with an interpolating growing neural gas

International Journal of Hybrid Intelligent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we analyze the convergence of Q-learning with linear function approximation. We identify a set of conditions that implies the convergence of this method with probability 1, when a fixed learning policy is used. We discuss the differences and similarities between our results and those obtained in several related works. We also discuss the applicability of this method when a changing policy is used. Finally, we describe the applicability of this approximate method in partially observable scenarios.