A time aggregation approach to Markov decision processes

Authors:
Xi-Ren Cao;Zhiyuan Ren;Shalabh Bhatnagar;Michael Fu;Steven Marcus
Affiliations:
Department of Electrical and Electronic Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong;Department of Electrical and Electronic Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong and Department of Electrical and Electronic Engineering ...;Institute for Systems Research, University of Maryland at College Park, USA;Institute for Systems Research, University of Maryland at College Park, USA and Institute for Systems Research, University of Maryland at College Park, USA;Institute for Systems Research, University of Maryland at College Park, USA and Institute for Systems Research, University of Maryland at College Park, USA
Venue:
Automatica (Journal of IFAC)
Year:
2002

Citing 7
Cited 6

Single sample path-based optimization of Markov chains

Journal of Optimization Theory and Applications - Special issue in honor of Yu-Chi Ho
Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning

Artificial Intelligence
Neuro-Dynamic Programming

Neuro-Dynamic Programming
The Relations Among Potentials, Perturbation Analysis,and Markov Decision Processes

Discrete Event Dynamic Systems
Hierarchical control and learning for markov decision processes

Hierarchical control and learning for markov decision processes
Brief paper: Average cost temporal-difference learning

Automatica (Journal of IFAC)
Technical Communique: A unified approach to Markov decision problems and performance sensitivity analysis

Automatica (Journal of IFAC)

From Perturbation Analysis to Markov Decision Processes and Reinforcement Learning

Discrete Event Dynamic Systems
Recent Advances in Hierarchical Reinforcement Learning

Discrete Event Dynamic Systems
Basic Ideas for Event-Based Optimization of Markov Systems

Discrete Event Dynamic Systems
Markov decision process applied to the control of hospital elective admissions

Artificial Intelligence in Medicine
The control of a two-level Markov decision process by time aggregation

Automatica (Journal of IFAC)
Time aggregated Markov decision processes via standard dynamic programming

Operations Research Letters

Quantified Score

Hi-index	22.15

Visualization

Abstract

We propose a time aggregation approach for the solution of infinite horizon average cost Markov decision processes via policy iteration. In this approach, policy update is only carried out when the process visits a subset of the state space. As in state aggregation, this approach leads to a reduced state space, which may lead to a substantial reduction in computational and storage requirements, especially for problems with certain structural properties. However, in contrast to state aggregation, which generally results in an approximate model due to the loss of Markov property, time aggregation suffers no loss of accuracy, because the Markov property is preserved. Single sample path-based estimation algorithms are developed that allow the time aggregation approach to be implemented on-line for practical systems. Some numerical and simulation examples are presented to illustrate the ideas and potential computational savings.