Pseudometrics for State Aggregation in Average Reward Markov Decision Processes

Authors:
Ronald Ortner
Affiliations:
University of Leoben, A-8700 Leoben, Austria
Venue:
ALT '07 Proceedings of the 18th international conference on Algorithmic Learning Theory
Year:
2007

Citing 4
Cited 3

Equivalence notions and model minimization in Markov decision processes

Artificial Intelligence - special issue on planning with uncertainty and incomplete information
Metrics for finite Markov decision processes

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Pattern Recognition, Third Edition

Pattern Recognition, Third Edition
Linear dependence of stationary distributions in ergodic Markov decision processes

Operations Research Letters

Online Regret Bounds for Markov Decision Processes with Deterministic Transitions

ALT '08 Proceedings of the 19th international conference on Algorithmic Learning Theory
Bisimulation Metrics for Continuous Markov Decision Processes

SIAM Journal on Computing
Regret bounds for restless markov bandits

ALT'12 Proceedings of the 23rd international conference on Algorithmic Learning Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider how state similarity in average reward Markov decision processes (MDPs) may be described by pseudometrics. Introducing the notion of adequatepseudometrics which are well adapted to the structure of the MDP, we show how these may be used for state aggregation. Upper bounds on the loss that may be caused by working on the aggregated instead of the original MDP are given and compared to the bounds that have been achieved for discounted reward MDPs.