A unified approach to Markov decision problems and performance sensitivity analysis with discounted and average criteria: multichain cases

Authors:
Xi-Ren Cao;Xianping Guo
Affiliations:
Department of Electrical and Electronic Engineering, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong;Zhongshan University, Guangzhou, PR China
Venue:
Automatica (Journal of IFAC)
Year:
2004

Citing 8
Cited 5

Single sample path-based optimization of Markov chains

Journal of Optimization Theory and Applications - Special issue in honor of Yu-Chi Ho
Limiting Average Criteria For Nonstationary Markov Decision Processes

SIAM Journal on Optimization
The Relations Among Potentials, Perturbation Analysis,and Markov Decision Processes

Discrete Event Dynamic Systems
From Perturbation Analysis to Markov Decision Processes and Reinforcement Learning

Discrete Event Dynamic Systems
CONVERGENCE OF SIMULATION-BASED POLICY ITERATION

Probability in the Engineering and Informational Sciences
Infinite-horizon policy-gradient estimation

Journal of Artificial Intelligence Research
Technical Communique: A unified approach to Markov decision problems and performance sensitivity analysis

Automatica (Journal of IFAC)
Brief Minimax control for discrete-time time-varying stochastic systems

Automatica (Journal of IFAC)

Basic Ideas for Event-Based Optimization of Markov Systems

Discrete Event Dynamic Systems
STEWARD: demo of spatio-textual extraction on the web aiding the retrieval of documents

dg.o '07 Proceedings of the 8th annual international conference on Digital government research: bridging disciplines & domains
Reinforcement Learning: A Tutorial Survey and Recent Advances

INFORMS Journal on Computing
Continuous-time Markov decision processes with nth-bias optimality criteria

Automatica (Journal of IFAC)
Bias optimality for multichain continuous-time Markov decision processes

Operations Research Letters

Quantified Score

Hi-index	22.15

Visualization

Abstract

We propose a unified framework to Markov decision problems and performance sensitivity analysis for multichain Markov processes with both discounted and average-cost performance criteria. With the fundamental concept of performance potentials, we derive both performance-gradient and performance-difference formulas, which play the central role in performance optimization. The standard policy iteration algorithms for both discounted- and average-reward MDPs can be established using the performance-difference formulas in a simple and intuitive way; and the performance-gradient formulas together with stochastic approximation may lead to new optimization schemes. This sensitivity-based point of view of performance optimization provides some insights that link perturbation analysis, Markov decision processes, and reinforcement learning together. The research is an extension of the previous work on ergodic Markov chains (Cao, Automatica 36 (2000) 771).