Stochastic dynamic programming and the control of queueing systems
Stochastic dynamic programming and the control of queueing systems
Bias Optimality for Continuous-Time Controlled Markov Chains
SIAM Journal on Control and Optimization
Hi-index | 0.00 |
This paper deals with the bias optimality of multichain models for finite continuous-time Markov decision processes. Based on new performance difference formulas developed here, we prove the convergence of a so-called bias-optimal policy iteration algorithm, which can be used to obtain bias-optimal policies in a finite number of iterations.