A time aggregation approach to Markov decision processes

  • Authors:
  • Xi-Ren Cao;Zhiyuan Ren;Shalabh Bhatnagar;Michael Fu;Steven Marcus

  • Affiliations:
  • Department of Electrical and Electronic Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong;Department of Electrical and Electronic Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong and Department of Electrical and Electronic Engineering ...;Institute for Systems Research, University of Maryland at College Park, USA;Institute for Systems Research, University of Maryland at College Park, USA and Institute for Systems Research, University of Maryland at College Park, USA;Institute for Systems Research, University of Maryland at College Park, USA and Institute for Systems Research, University of Maryland at College Park, USA

  • Venue:
  • Automatica (Journal of IFAC)
  • Year:
  • 2002

Quantified Score

Hi-index 22.15

Visualization

Abstract

We propose a time aggregation approach for the solution of infinite horizon average cost Markov decision processes via policy iteration. In this approach, policy update is only carried out when the process visits a subset of the state space. As in state aggregation, this approach leads to a reduced state space, which may lead to a substantial reduction in computational and storage requirements, especially for problems with certain structural properties. However, in contrast to state aggregation, which generally results in an approximate model due to the loss of Markov property, time aggregation suffers no loss of accuracy, because the Markov property is preserved. Single sample path-based estimation algorithms are developed that allow the time aggregation approach to be implemented on-line for practical systems. Some numerical and simulation examples are presented to illustrate the ideas and potential computational savings.