Exploiting the Structural Properties of the Underlying Markov Decision Problem in the Q-Learning Algorithm

Authors:
Sumit Kunnumkal;Huseyin Topaloglu
Affiliations:
Indian School of Business, Gachibowli, Hyderabad 500032, India;School of Operations Research and Information Engineering, Cornell University, Ithaca, New York 14853
Venue:
INFORMS Journal on Computing
Year:
2008

Citing 0
Cited 1

A stochastic approximation method with max-norm projections and its applications to the Q-learning algorithm

ACM Transactions on Modeling and Computer Simulation (TOMACS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper shows how to exploit the structural properties of the underlying Markov decision problem to improve the convergence behavior of the Q-learning algorithm. In particular, we consider infinite-horizon discounted-cost Markov decision problems where there is a natural ordering between the states of the system and the value function is known to be monotone in the state. We propose a new variant of the Q-learning algorithm that ensures that the value function approximations obtained during the intermediate iterations are also monotone in the state. We establish the convergence of the proposed algorithm and experimentally show that it significantly improves the convergence behavior of the standard version of the Q-learning algorithm.