Markov decision problems and state-action frequencies
SIAM Journal on Control and Optimization
Discrete-time controlled Markov processes with average cost criterion: a survey
SIAM Journal on Control and Optimization
On the Milito-Cruz adaptive control scheme for Markov chains
Journal of Optimization Theory and Applications
Probability (2nd ed.)
Asymptotically efficient adaptive strategies in repeated games part II: asymptotic optimality
Mathematics of Operations Research
Optimal adaptive policies for Markov decision processes
Mathematics of Operations Research
Prediction, Learning, and Games
Prediction, Learning, and Games
Introduction to Probability Models, Ninth Edition
Introduction to Probability Models, Ninth Edition
Efficient Reinforcement Learning in Parameterized Models: Discrete Parameter Case
Recent Advances in Reinforcement Learning
Hi-index | 0.00 |
Ergodic control for discrete time controlled Markov chains with a locally compact state space and a compact action space is considered under suitable stability, irreducibility, and Feller continuity conditions. A flexible family of controls, called action time sharing (ATS) policies, associated with a given continuous stationary Markov control, is introduced. It is shown that the long-term average cost for such a control policy, for a broad range of one-stage cost functions, is the same as that for the associated stationary Markov policy. In addition, ATS policies are well suited for a range of estimation, information collection, and adaptive control goals. To illustrate the possibilities we present two examples. The first demonstrates a construction of an ATS policy that leads to consistent estimators for unknown model parameters while producing the desired long-term average cost value. The second example considers a setting where the target stationary Markov control $q$ is not known but there are sampling schemes available that allow for consistent estimation of $q$. We construct an ATS policy which uses dynamic estimators for $q$ for control decisions and show that the associated cost coincides with that for the unknown Markov control $q$.