Asynchronous Stochastic Approximation and Q-Learning
Machine Learning
Stochastic approximation with two time scales
Systems & Control Letters
A one-measurement form of simultaneous perturbation stochastic approximation
Automatica (Journal of IFAC)
Actor-Critic--Type Learning Algorithms for Markov Decision Processes
SIAM Journal on Control and Optimization
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming
Mathematics of Operations Research
Approximate stochastic annealing for online control of infinite horizon Markov decision processes
Automatica (Journal of IFAC)
Hi-index | 22.15 |
We propose two algorithms for Q-learning that use the two-timescale stochastic approximation methodology. The first of these updates Q-values of all feasible state-action pairs at each instant while the second updates Q-values of states with actions chosen according to the 'current' randomized policy updates. A proof of convergence of the algorithms is shown. Finally, numerical experiments using the proposed algorithms on an application of routing in communication networks are presented on a few different settings.