The control of a two-level Markov decision process by time aggregation

Authors:
Yat-Wah Wan;Xi-Ren Cao
Affiliations:
Institute of Global Operations Strategy and Logistics Management, National Dong Hwa University, Hualien, Taiwan;Department of Electrical and Electronic Engineering, Hong Kong University of Science and Technology, Kowloon, Hong Kong
Venue:
Automatica (Journal of IFAC)
Year:
2006

Citing 5
Cited 0

Singularly perturbed Markov control problem: limiting average cost

Annals of Operations Research
Single sample path-based optimization of Markov chains

Journal of Optimization Theory and Applications - Special issue in honor of Yu-Chi Ho
The Relations Among Potentials, Perturbation Analysis,and Markov Decision Processes

Discrete Event Dynamic Systems
A two-factor stochastic production model with two time scales

Automatica (Journal of IFAC)
A time aggregation approach to Markov decision processes

Automatica (Journal of IFAC)

Quantified Score

Hi-index	22.14

Visualization

Abstract

The solution of Markov Decision Processes (MDPs) often relies on special properties of the processes. For two-level MDPs, the difference in the rates of state changes of the upper and lower levels has led to limiting or approximate solutions of such problems. In this paper, we solve a two-level MDP without making any assumption on the rates of state changes of the two levels. We first show that such a two-level MDP is a non-standard one where the optimal actions of different states can be related to each other. Then we give assumptions (conditions) under which such a specially constrained MDP can be solved by policy iteration. We further show that the computational effort can be reduced by decomposing the MDP. A two-level MDP with M upper-level states can be decomposed into one MDP for the upper level and M to M(M-1) MDPs for the lower level, depending on the structure of the two-level MDP. The upper-level MDP is solved by time aggregation, a technique introduced in a recent paper [Cao, X.-R., Ren, Z. Y., Bhatnagar, S., Fu, M., & Marcus, S. (2002). A time aggregation approach to Markov decision processes. Automatica, 38(6), 929-943.], and the lower-level MDPs are solved by embedded Markov chains.