A unified approach to Markov decision problems and performance sensitivity analysis with discounted and average criteria: multichain cases

  • Authors:
  • Xi-Ren Cao;Xianping Guo

  • Affiliations:
  • Department of Electrical and Electronic Engineering, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong;Zhongshan University, Guangzhou, PR China

  • Venue:
  • Automatica (Journal of IFAC)
  • Year:
  • 2004

Quantified Score

Hi-index 22.15

Visualization

Abstract

We propose a unified framework to Markov decision problems and performance sensitivity analysis for multichain Markov processes with both discounted and average-cost performance criteria. With the fundamental concept of performance potentials, we derive both performance-gradient and performance-difference formulas, which play the central role in performance optimization. The standard policy iteration algorithms for both discounted- and average-reward MDPs can be established using the performance-difference formulas in a simple and intuitive way; and the performance-gradient formulas together with stochastic approximation may lead to new optimization schemes. This sensitivity-based point of view of performance optimization provides some insights that link perturbation analysis, Markov decision processes, and reinforcement learning together. The research is an extension of the previous work on ergodic Markov chains (Cao, Automatica 36 (2000) 771).