Partially Observable Markov Decision Processes and Performance Sensitivity Analysis

  • Authors:
  • Yanjie Li;Baoqun Yin;Hongsheng Xi

  • Affiliations:
  • Dept. of Autom., Univ. of Sci. & Technol. of China, Hefei;-;-

  • Venue:
  • IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The sensitivity-based optimization of Markov systems has become an increasingly important area. From the perspective of performance sensitivity analysis, policy-iteration algorithms and gradient estimation methods can be directly obtained for Markov decision processes (MDPs). In this correspondence, the sensitivity-based optimization is extended to average reward partially observable MDPs (POMDPs). We derive the performance-difference and performance-derivative formulas of POMDPs. On the basis of the performance-derivative formula, we present a new method to estimate the performance gradients. From the performance-difference formula, we obtain a sufficient optimality condition without the discounted reward formulation. We also propose a policy-iteration algorithm to obtain a nearly optimal finite-state-controller policy.