Pseudometrics for State Aggregation in Average Reward Markov Decision Processes

  • Authors:
  • Ronald Ortner

  • Affiliations:
  • University of Leoben, A-8700 Leoben, Austria

  • Venue:
  • ALT '07 Proceedings of the 18th international conference on Algorithmic Learning Theory
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider how state similarity in average reward Markov decision processes (MDPs) may be described by pseudometrics. Introducing the notion of adequatepseudometrics which are well adapted to the structure of the MDP, we show how these may be used for state aggregation. Upper bounds on the loss that may be caused by working on the aggregated instead of the original MDP are given and compared to the bounds that have been achieved for discounted reward MDPs.