Solving H-horizon, stationary Markov decision problems in time proportional to log(H)

Authors:
Paul Tseng
Affiliations:
Laboratory for Information and Decision Systems, Massachussetts Institute of Technology, Cambridge, MA 02139, USA
Venue:
Operations Research Letters
Year:
1990

Citing 10
Cited 4

A new polynomial-time algorithm for linear programming

Combinatorica
Games against nature

Journal of Computer and System Sciences
Dynamic programming: deterministic and stochastic models

Dynamic programming: deterministic and stochastic models
Combinatorial optimization: algorithms and complexity

Combinatorial optimization: algorithms and complexity
Parallel complexity theory

Parallel complexity theory
The complexity of Markov decision processes

Mathematics of Operations Research
Parallel and distributed computation: numerical methods

Parallel and distributed computation: numerical methods
Progress in Mathematical Programming Interior-point and related methods

Progress in Mathematical Programming Interior-point and related methods
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
The complexity of dynamic languages and dynamic optimization problems

STOC '81 Proceedings of the thirteenth annual ACM symposium on Theory of computing

Reachability analysis of uncertain systems using bounded-parameter Markov decision processes

Artificial Intelligence
Planning and acting in partially observable stochastic domains

Artificial Intelligence
Survey A survey of computational complexity results in systems and control

Automatica (Journal of IFAC)
Parallel Abductive Query Answering in Probabilistic Logic Programs

ACM Transactions on Computational Logic (TOCL)

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the H-horizon, stationary Markov decision problem. For the discounted case, we give an @e-approximation algorithm whose time is proportional to log(1/@e), log(H) and1(1 - @a). For problems where @a is bounded away from 1, we obtain, respectively, a fully polynomial approximation scheme and a polynomial-time algorithm. For the undiscounted case, by refining a weighted maximum norm contraction result of Hoffman, we derive analogous results under the assumption that all stationary policies are proper.