Error bounds of optimization algorithms for semi-Markov decision processes

  • Authors:
  • Tang Hao;Yin Baoqun;Xi Hongsheng

  • Affiliations:
  • School of Computer and Information, Hefei University of Technology, Hefei, Anhui, P.R.China;Department of Automation, University of Science and Technology of China, Hefei, Anhui, P.R. China;Department of Automation, University of Science and Technology of China, Hefei, Anhui, P.R. China

  • Venue:
  • International Journal of Systems Science
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Cao's work shows that, by defining an α-dependent equivalent infinitesimal generator Aα, a semi-Markov decision process (SMDP) with both average- and discounted-cost criteria can be treated as an α-equivalent Markov decision process (MDP), and the performance potential theory can also be developed for SMDPs. In this work, we focus on establishing error bounds for potential and Aα-based iterative optimization methods. First, we introduce an α-uniformized Markov chain (UMC) for a SMDP via Aα and a uniformized parameter, and show their relations. Especially, we obtain that their performance potentials, as solutions of corresponding Poisson equations, are proportional, so that the studies of a SMDP and the α-UMC based on potentials are unified. Using these relations, we derive the error bounds for a potential-based policy-iteration algorithm and a value-iteration algorithm, respectively, when there exist various calculation errors. The obtained results can be applied directly to the special models, i.e., continuous-time MDPs and Markov chains, and can be extended to some simulation-based optimization methods such as reinforcement learning and neurodynamic programming, where estimation errors or approximation errors are common cases. Finally, we give an application example on the look-ahead control of a conveyor-serviced production station (CSPS), and show the corresponding error bounds.