Actor-Critic--Type Learning Algorithms for Markov Decision Processes

  • Authors:
  • Vijaymohan R. Konda;Vivek S. Borkar

  • Affiliations:
  • -;-

  • Venue:
  • SIAM Journal on Control and Optimization
  • Year:
  • 1999

Quantified Score

Hi-index 0.01

Visualization

Abstract

Algorithms for learning the optimal policy of a Markov decision process (MDP) based on simulated transitions are formulated and analyzed. These are variants of the well-known "actor-critic" (or "adaptive critic") algorithm in the artificial intelligence literature. Distributed asynchronous implementations are considered. The analysis involves two time scale stochastic approximations.