Partially Observable Markov Decision Processes and Performance Sensitivity Analysis

Authors:
Yanjie Li;Baoqun Yin;Hongsheng Xi
Affiliations:
Dept. of Autom., Univ. of Sci. & Technol. of China, Hefei;-;-
Venue:
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Year:
2008

Citing 0
Cited 2

Observable subspace solution for irreducible POMDPs with infinite horizon

Proceedings of the Seventh Annual Workshop on Cyber Security and Information Intelligence Research
Admission control with elastic QoS for video on demand systems

International Journal of Automation and Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The sensitivity-based optimization of Markov systems has become an increasingly important area. From the perspective of performance sensitivity analysis, policy-iteration algorithms and gradient estimation methods can be directly obtained for Markov decision processes (MDPs). In this correspondence, the sensitivity-based optimization is extended to average reward partially observable MDPs (POMDPs). We derive the performance-difference and performance-derivative formulas of POMDPs. On the basis of the performance-derivative formula, we present a new method to estimate the performance gradients. From the performance-difference formula, we obtain a sufficient optimality condition without the discounted reward formulation. We also propose a policy-iteration algorithm to obtain a nearly optimal finite-state-controller policy.