The complexity of Markov decision processes
Mathematics of Operations Research
Dyna, an integrated architecture for learning, planning, and reacting
ACM SIGART Bulletin
Technical Note: \cal Q-Learning
Machine Learning
TD-Gammon, a self-teaching backgammon program, achieves master-level play
Neural Computation
Feature-based methods for large scale dynamic programming
Machine Learning - Special issue on reinforcement learning
Neuro-Dynamic Programming
Learning to Predict by the Methods of Temporal Differences
Machine Learning
Off-Policy Temporal Difference Learning with Function Approximation
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Least-Squares Temporal Difference Learning
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Stable Function Approximation in Dynamic Programming
Stable Function Approximation in Dynamic Programming
Learning to Act using Real-Time Dynamic Programming
Learning to Act using Real-Time Dynamic Programming
Exact and approximate algorithms for partially observable markov decision processes
Exact and approximate algorithms for partially observable markov decision processes
A LEARNING ALGORITHM FOR DISCRETE-TIME STOCHASTIC CONTROL
Probability in the Engineering and Informational Sciences
Interpolation-based Q-learning
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Automatic basis function construction for approximate dynamic programming and reinforcement learning
ICML '06 Proceedings of the 23rd international conference on Machine learning
Nonapproximability results for partially observable Markov decision processes
Journal of Artificial Intelligence Research
Approximating optimal policies for partially observable stochastic domains
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
A neurocomputational model for cocaine addiction
Neural Computation
A task annotation model for sandbox Serious Games
CIG'09 Proceedings of the 5th international conference on Computational Intelligence and Games
Wireless Personal Communications: An International Journal
Adaptive function approximation in reinforcement learning with an interpolating growing neural gas
International Journal of Hybrid Intelligent Systems
Hi-index | 0.00 |
In this paper, we analyze the convergence of Q-learning with linear function approximation. We identify a set of conditions that implies the convergence of this method with probability 1, when a fixed learning policy is used. We discuss the differences and similarities between our results and those obtained in several related works. We also discuss the applicability of this method when a changing policy is used. Finally, we describe the applicability of this approximate method in partially observable scenarios.