Policy iteration type algorithms for recurrent state Markov decision processes

Authors:
Stephen D. Patek
Affiliations:
Department of Systems and Information Engineering, UVA, 151 Engineers Way, P.O. Box 400747, Charlottesville, VA
Venue:
Computers and Operations Research
Year:
2004

Citing 7
Cited 0

On the convergence of policy iteration in finite state undiscounted Markov decision processes: the unichain case

Mathematics of Operations Research
An analysis of stochastic shortest path problems

Mathematics of Operations Research
Discrete-time controlled Markov processes with average cost criterion: a survey

SIAM Journal on Control and Optimization
A New Value Iteration method for the Average Cost Dynamic Programming Problem

SIAM Journal on Control and Optimization
Congestion-dependent pricing of network services

IEEE/ACM Transactions on Networking (TON)
Dynamic Programming and Optimal Control, Two Volume Set

Dynamic Programming and Optimal Control, Two Volume Set
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming

Quantified Score

Hi-index	0.01

Visualization

Abstract

We introduce and analyze several new policy iteration type algorithms for average cost Markov decision processes (MDPs). We limit attention to "recurrent state" processes where there exists a state which is recurrent under all stationary policies, and our analysis applies to finite-state problems with compact constraint sets, continuous transition probability functions, and lower-semicontinuous cost functions. The analysis makes use of an underlying relationship between recurrent state MDPs and the so-called stochastic shortest path problems of Bertsekas and Tsitsiklis (Math. Oper. Res. 16(3)(1991) 580). After extending this relationship, we establish the convergence of the new policy iteration type algorithms either to optimality or to within ε 0 of the optimal average cost.