Numerical analysis of continuous time Markov decision processes over finite horizons

Authors:
Peter Buchholz;Ingo Schulz
Affiliations:
Department of Computer Science, TU Dortmund, D-44221 Dortmund, Germany;Department of Computer Science, TU Dortmund, D-44221 Dortmund, Germany
Venue:
Computers and Operations Research
Year:
2011

Citing 4
Cited 4

Efficient computation of time-bounded reachability probabilities in uniform continuous-time Markov decision processes

Theoretical Computer Science - Tools and algorithms for the construction and analysis of systems (TACAS 2004)
Dynamic Programming and Optimal Control, Vol. II

Dynamic Programming and Optimal Control, Vol. II
Simulation-Based Optimization Algorithms for Finite-Horizon Markov Decision Processes

Simulation
On the Numerical Analysis of Inhomogeneous Continuous-Time Markov Chains

INFORMS Journal on Computing

Model checking algorithms for CTMDPs

CAV'11 Proceedings of the 23rd international conference on Computer aided verification
Observing continuous-time MDPs by 1-clock timed automata

RP'11 Proceedings of the 5th international conference on Reachability problems
Quantitative timed analysis of interactive markov chains

NFM'12 Proceedings of the 4th international conference on NASA Formal Methods
Compositional verification and optimization of interactive markov chains

CONCUR'13 Proceedings of the 24th international conference on Concurrency Theory

Quantified Score

Hi-index	0.01

Visualization

Abstract

Continuous time Markov decision processes (CTMDPs) with a finite state and action space have been considered for a long time. It is known that under fairly general conditions the reward gained over a finite horizon can be maximized by a so-called piecewise constant policy which changes only finitely often in a finite interval. Although this result is available for more than 30 years, numerical analysis approaches to compute the optimal policy and reward are restricted to discretization methods which are known to converge to the true solution if the discretization step goes to zero. In this paper, we present a new method that is based on uniformization of the CTMDP and allows one to compute an @e-optimal policy up to a predefined precision in a numerically stable way using adaptive time steps.