Pseudometrics for State Aggregation in Average Reward Markov Decision Processes
ALT '07 Proceedings of the 18th international conference on Algorithmic Learning Theory
Hi-index | 0.00 |
In ergodic MDPs we consider stationary distributions of policies that coincide in all but n states, in which one of two possible actions is chosen. We give conditions and formulas for linear dependence of the stationary distributions of n+2 such policies, and show some results about combinations and mixtures of policies.