Technical Note: \cal Q-Learning
Machine Learning
Asynchronous Stochastic Approximation and Q-Learning
Machine Learning
Reinforcement learning with replacing eligibility traces
Machine Learning - Special issue on reinforcement learning
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Neuro-Dynamic Programming
An Optimal Approximate Dynamic Programming Algorithm for the Lagged Asset Acquisition Problem
Mathematics of Operations Research
Model-free reinforcement learning as mixture learning
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Generalized policy iteration for continuous-time systems
IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Sampled fictitious play for approximate dynamic programming
Computers and Operations Research
Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming
Mathematics of Operations Research
Hi-index | 0.00 |
We consider a finite-state Markov decision problem and establish the convergence of a special case of optimistic policy iteration that involves Monte Carlo estimation of Q-values, in conjunction with greedy policy selection. We provide convergence results for a number of algorithmic variations, including one that involves temporal difference learning (bootstrapping) instead of Monte Carlo estimation. We also indicate some extensions that either fail or are unlikely to go through.