Brief paper: New algorithms of the Q-learning type

Authors:
Shalabh Bhatnagar;K. Mohan Babu
Affiliations:
Department of Computer Science and Automation, Indian Institute of Science, Bangalore 560 012, India;Motorola India Electronics Ltd., Bangalore, India
Venue:
Automatica (Journal of IFAC)
Year:
2008

Citing 5
Cited 2

Asynchronous Stochastic Approximation and Q-Learning

Machine Learning
Stochastic approximation with two time scales

Systems & Control Letters
A one-measurement form of simultaneous perturbation stochastic approximation

Automatica (Journal of IFAC)
Actor-Critic--Type Learning Algorithms for Markov Decision Processes

SIAM Journal on Control and Optimization
Two-timescale simultaneous perturbation stochastic approximation using deterministic perturbation sequences

ACM Transactions on Modeling and Computer Simulation (TOMACS)

Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming

Mathematics of Operations Research
Approximate stochastic annealing for online control of infinite horizon Markov decision processes

Automatica (Journal of IFAC)

Quantified Score

Hi-index	22.15

Visualization

Abstract

We propose two algorithms for Q-learning that use the two-timescale stochastic approximation methodology. The first of these updates Q-values of all feasible state-action pairs at each instant while the second updates Q-values of states with actions chosen according to the 'current' randomized policy updates. A proof of convergence of the algorithms is shown. Finally, numerical experiments using the proposed algorithms on an application of routing in communication networks are presented on a few different settings.