Stochastic systems: estimation, identification and adaptive control
Stochastic systems: estimation, identification and adaptive control
Markov decision problems and state-action frequencies
SIAM Journal on Control and Optimization
Mathematics of Operations Research
Introduction to Linear Optimization
Introduction to Linear Optimization
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Neuro-Dynamic Programming
Finite State Markovian Decision Processes
Finite State Markovian Decision Processes
The error exponent of variable-length codes over Markov channels with feedback
IEEE Transactions on Information Theory
An Anonymous Sequential Game Approach for Battery State Dependent Power Control
NET-COOP '09 Proceedings of the 3rd Euro-NF Conference on Network Control and Optimization
Simulation-based optimization of Markov decision processes: An empirical process theory approach
Automatica (Journal of IFAC)
NP-Hardness of checking the unichain condition in average cost MDPs
Operations Research Letters
Fast convergence to state-action frequency polytopes for MDPs
Operations Research Letters
Hi-index | 0.06 |
We consider the empirical state-action frequencies and the empirical reward in weakly communicating finite-state Markov decision processes under general policies. We define a certain polytope and establish that every element of this polytope is the limit of the empirical frequency vector, under some policy, in a strong sense. Furthermore, we show that the probability of exceeding a given distance between the empirical frequency vector and the polytope decays exponentially with time under every policy. We provide similar results for vector-valued empirical rewards.