Topological value iteration algorithm for Markov decision processes

Authors:
Peng Dai;Judy Goldsmith
Affiliations:
Computer Science Dept., University of Kentucky, Lexington, KY;Computer Science Dept., University of Kentucky, Lexington, KY
Venue:
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Year:
2007

Citing 13
Cited 10

Learning to act using real-time dynamic programming

Artificial Intelligence - Special volume on computational research on interaction and agency, part 1
LAO: a heuristic search algorithm that finds solutions with loops

Artificial Intelligence - Special issue on heuristic search in artificial intelligence
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Algorithms

Introduction to Algorithms
Greedy linear value-approximation for factored Markov decision processes

Eighteenth national conference on Artificial intelligence
Piecewise linear value function approximation for factored MDPs

Eighteenth national conference on Artificial intelligence
Symbolic heuristic search for factored Markov decision processes

Eighteenth national conference on Artificial intelligence
Dynamic Programming

Dynamic Programming
Efficient solution algorithms for factored MDPs

Journal of Artificial Intelligence Research
Faster heuristic search algorithms for planning with uncertainty and full feedback

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
SPUDD: stochastic planning using decision diagrams

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
On the complexity of solving Markov decision problems

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
Symbolic generalization for on-line planning

UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence

Routing in a cyclic mobispace

Proceedings of the 9th ACM international symposium on Mobile ad hoc networking and computing
Expediting RL by using graphical structures

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 3
Topological order planner for POMDPs

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Cosine Policy Iteration for Solving Infinite-Horizon Markov Decision Processes

MICAI '09 Proceedings of the 8th Mexican International Conference on Artificial Intelligence
New prioritized value iteration for Markov decision processes

Artificial Intelligence Review
PASS: abstraction refinement for infinite probabilistic models

TACAS'10 Proceedings of the 16th international conference on Tools and Algorithms for the Construction and Analysis of Systems
Yield management of workforce for IT service providers

Decision Support Systems
Topological value iteration algorithms

Journal of Artificial Intelligence Research
DetH: approximate hierarchical solution of large Markov decision processes

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
A survey of point-based POMDP solvers

Autonomous Agents and Multi-Agent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Value Iteration is an inefficient algorithm for Markov decision processes (MDPs) because it puts the majority of its effort into backing up the entire state space, which turns out to be unnecessary in many cases. In order to overcome this problem, many approaches have been proposed. Among them, LAO, LRTDP and HDP are state-of-theart ones. All of these use reachability analysis and heuristics to avoid some unnecessary backups. However, none of these approaches fully exploit the graphical features of the MDPs or use these features to yield the best backup sequence of the state space. We introduce an algorithm named Topological Value Iteration (TVI) that can circumvent the problem of unnecessary backups by detecting the structure of MDPs and backing up states based on topological sequences. We prove that the backup sequence TVI applies is optimal. Our experimental results show that TVI outperforms VI, LAO, LRTDP and HDP on our benchmark MDPs.