Discovering hidden structure in factored MDPs

Authors:
Andrey Kolobov; Mausam;Daniel S. Weld
Affiliations:
Dept. of Computer Science and Engineering, University of Washington, Seattle, WA 98195, United States;Dept. of Computer Science and Engineering, University of Washington, Seattle, WA 98195, United States;Dept. of Computer Science and Engineering, University of Washington, Seattle, WA 98195, United States
Venue:
Artificial Intelligence
Year:
2012

Citing 27
Cited 0

An assumption-based TMS

Artificial Intelligence
The computational complexity of propositional STRIPS planning

Artificial Intelligence
Failure driven dynamic search control for partial order planners: an explanation based approach

Artificial Intelligence
Fast planning through planning graph analysis

Artificial Intelligence
LAO: a heuristic search algorithm that finds solutions with loops

Artificial Intelligence - Special issue on heuristic search in artificial intelligence
Dynamic Programming and Optimal Control

Dynamic Programming and Optimal Control
Neuro-Dynamic Programming

Neuro-Dynamic Programming
The CN2 Induction Algorithm

Machine Learning
Symbolic heuristic search for factored Markov decision processes

Eighteenth national conference on Artificial intelligence
Constraint Processing

Constraint Processing
Stable Function Approximation in Dynamic Programming

Stable Function Approximation in Dynamic Programming
Exploiting first-order regression in inductive policy selection

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Automatic basis function construction for approximate dynamic programming and reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
The factored policy-gradient planner

Artificial Intelligence
Probabilistic planning via determinization in hindsight

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
The FF planning system: fast plan generation through heuristic search

Journal of Artificial Intelligence Research
Efficient solution algorithms for factored MDPs

Journal of Artificial Intelligence Research
Planning through stochastic local search and temporal action graphs in LPG

Journal of Artificial Intelligence Research
mGPT: a probabilistic planner based on heuristic search

Journal of Artificial Intelligence Research
A hybridized planner for stochastic domains

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Generalizing plans to new environments in relational MDPs

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Planning with continuous resources in stochastic domains

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Learning to act using real-time dynamic programming

Artificial Intelligence
ReTrASE: integrating paradigms for approximate probabilistic planning

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Integrating abstraction and explanation-based learning in PRODIGY

AAAI'91 Proceedings of the ninth National conference on Artificial intelligence - Volume 2
SPUDD: stochastic planning using decision diagrams

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
The complexity of plan existence and evaluation in robabilistic domains

UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Markov Decision Processes (MDPs) describe a wide variety of planning scenarios ranging from military operations planning to controlling a Mars rover. However, today@?s solution techniques scale poorly, limiting MDPs@? practical applicability. In this work, we propose algorithms that automatically discover and exploit the hidden structure of factored MDPs. Doing so helps solve MDPs faster and with less memory than state-of-the-art techniques. Our algorithms discover two complementary state abstractions - basis functions and nogoods. A basis function is a conjunction of literals; if the conjunction holds true in a state, this guarantees the existence of at least one trajectory to the goal. Conversely, a nogood is a conjunction whose presence implies the non-existence of any such trajectory, meaning the state is a dead end. We compute basis functions by regressing goal descriptions through a determinized version of the MDP. Nogoods are constructed with a novel machine learning algorithm that uses basis functions as training data. Our state abstractions can be leveraged in several ways. We describe three diverse approaches - GOTH, a heuristic function for use in heuristic search algorithms such as RTDP; ReTrASE, an MDP solver that performs modified Bellman backups on basis functions instead of states; and SixthSense, a method to quickly detect dead-end states. In essence, our work integrates ideas from deterministic planning and basis function-based approximation, leading to methods that outperform existing approaches by a wide margin.