Brief paper: New algorithms of the Q-learning type

  • Authors:
  • Shalabh Bhatnagar;K. Mohan Babu

  • Affiliations:
  • Department of Computer Science and Automation, Indian Institute of Science, Bangalore 560 012, India;Motorola India Electronics Ltd., Bangalore, India

  • Venue:
  • Automatica (Journal of IFAC)
  • Year:
  • 2008

Quantified Score

Hi-index 22.15

Visualization

Abstract

We propose two algorithms for Q-learning that use the two-timescale stochastic approximation methodology. The first of these updates Q-values of all feasible state-action pairs at each instant while the second updates Q-values of states with actions chosen according to the 'current' randomized policy updates. A proof of convergence of the algorithms is shown. Finally, numerical experiments using the proposed algorithms on an application of routing in communication networks are presented on a few different settings.