Solving uncertain markov decision problems: an interval-based method

Authors:
Shulin Cui;Jigui Sun;Minghao Yin;Shuai Lu
Affiliations:
College of Software, Jilin University, Changchun, China;College of Computer Science and Technology, Jilin University, Changchun, China;College of Computer Science and Technology, Jilin University, Changchun, China;College of Computer Science and Technology, Jilin University, Changchun, China
Venue:
ICNC'06 Proceedings of the Second international conference on Advances in Natural Computation - Volume Part II
Year:
2006

Citing 7
Cited 2

Bounded-parameter Markov decision process

Artificial Intelligence
Optimizing decision trees through heuristically guided search

Communications of the ACM
LAO: a heuristic search algorithm that finds solutions with loops

Artificial Intelligence - Special issue on heuristic search in artificial intelligence
Dynamic Programming and Optimal Control

Dynamic Programming and Optimal Control
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Robust planning with (L)RTDP

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Learning to act using real-time dynamic programming

Artificial Intelligence

Efficient solutions to factored MDPs with imprecise transition probabilities

Artificial Intelligence
Using mathematical programming to solve Factored Markov Decision Processes with Imprecise Probabilities

International Journal of Approximate Reasoning

Quantified Score

Hi-index	0.05

Visualization

Abstract

Stochastic Shortest Path problems (SSPs), a subclass of Markov Decision Problems (MDPs), can be efficiently dealt with VI, PI, RTDP, LAO* and so on. However, in many practical problems the estimation of the probabilities is far from accurate. In this paper, we present uncertain transition probabilities as close real intervals. Also, we describe a general algorithm, called gLAO*, that can solve uncertain MDPs efficiently. We demonstrate that Buffet and Aberdeen's approach, searching for the best policy under the worst model, is a special case of our approaches. Experiments show that gLAO* inherits excellent performance of LAO* for solving uncertain MDPs.