Markov Decision Processes: Discrete Stochastic Dynamic Programming
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Neuro-Dynamic Programming
The Relations Among Potentials, Perturbation Analysis,and Markov Decision Processes
Discrete Event Dynamic Systems
From Perturbation Analysis to Markov Decision Processes and Reinforcement Learning
Discrete Event Dynamic Systems
The optimal robust control policy for uncertain semi-Markov control processes
International Journal of Systems Science
Automatica (Journal of IFAC)
Hi-index | 0.00 |
Cao's work shows that, by defining an α-dependent equivalent infinitesimal generator Aα, a semi-Markov decision process (SMDP) with both average- and discounted-cost criteria can be treated as an α-equivalent Markov decision process (MDP), and the performance potential theory can also be developed for SMDPs. In this work, we focus on establishing error bounds for potential and Aα-based iterative optimization methods. First, we introduce an α-uniformized Markov chain (UMC) for a SMDP via Aα and a uniformized parameter, and show their relations. Especially, we obtain that their performance potentials, as solutions of corresponding Poisson equations, are proportional, so that the studies of a SMDP and the α-UMC based on potentials are unified. Using these relations, we derive the error bounds for a potential-based policy-iteration algorithm and a value-iteration algorithm, respectively, when there exist various calculation errors. The obtained results can be applied directly to the special models, i.e., continuous-time MDPs and Markov chains, and can be extended to some simulation-based optimization methods such as reinforcement learning and neurodynamic programming, where estimation errors or approximation errors are common cases. Finally, we give an application example on the look-ahead control of a conveyor-serviced production station (CSPS), and show the corresponding error bounds.