Efficient solution algorithms for factored MDPs

Authors:
Carlos Guestrin;Daphne Koller;Ronald Parr;Shobha Venkataraman
Affiliations:
Computer Science Dept., Stanford University;Computer Science Dept., Stanford University;Computer Science Dept., Duke University;Computer Science Dept., Carnegie Mellon University
Venue:
Journal of Artificial Intelligence Research
Year:
2003

Citing 30
Cited 64

Complexity of finding embeddings in a k-tree

SIAM Journal on Algebraic and Discrete Methods
Probabilistic reasoning in intelligent systems: networks of plausible inference

Probabilistic reasoning in intelligent systems: networks of plausible inference
A model for reasoning about persistence and causation

Computational Intelligence
Finding approximate separators and computing tree width quickly

STOC '92 Proceedings of the twenty-fourth annual ACM symposium on Theory of computing
Feature-based methods for large scale dynamic programming

Machine Learning - Special issue on reinforcement learning
The sciences of the artificial (3rd ed.)

The sciences of the artificial (3rd ed.)
Abstraction and approximate decision-theoretic planning

Artificial Intelligence
Solving very large weakly coupled Markov decision processes

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
How to dynamically merge Markov decision processes

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Bucket elimination: a unifying framework for reasoning

Artificial Intelligence
Stochastic dynamic programming with factored representations

Artificial Intelligence
A sufficiently fast algorithm for finding close to optimal clique trees

Artificial Intelligence
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Nonserial Dynamic Programming

Nonserial Dynamic Programming
Learning to Predict by the Methods of Temporal Differences

Machine Learning
On the Role of Context-Specific Independence in Probabilistic Inference

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Computing Factored Value Functions for Policies in Structured MDPs

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Policy Iteration for Factored MDPs

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Context-specific multiagent coordination and planning with factored MDPs

Eighteenth national conference on Artificial intelligence
Dynamic Programming

Dynamic Programming
Stable Function Approximation in Dynamic Programming

Stable Function Approximation in Dynamic Programming
Learning and value function approximation in complex decision processes

Learning and value function approximation in complex decision processes
The Linear Programming Approach to Approximate Dynamic Programming

Operations Research
On Constraint Sampling in the Linear Programming Approach to Approximate Dynamic Programming

Mathematics of Operations Research
Max-norm projections for factored MDPs

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 1
Solving factored MDPs via non-homogeneous partitioning

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 1
Exploiting structure in policy construction

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Model minimization in Markov decision processes

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
SPUDD: stochastic planning using decision diagrams

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence

Graphical Models in Local, Asymmetric Multi-Agent Markov Decision Processes

AAMAS '04 Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 2
Solving factored MDPs with continuous and discrete variables

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Learning the structure of Factored Markov Decision Processes in reinforcement learning problems

ICML '06 Proceedings of the 23rd international conference on Machine learning
Probabilistic inference for solving discrete and continuous state Markov Decision Processes

ICML '06 Proceedings of the 23rd international conference on Machine learning
Resource allocation among agents with preferences induced by factored MDPs

AAMAS '06 Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
A Cost-Shaping Linear Program for Average-Cost Approximate Dynamic Programming with Performance Guarantees

Mathematics of Operations Research
Automatic shaping and decomposition of reward functions

Proceedings of the 24th international conference on Machine learning
Using the Simulated Annealing Algorithm for Multiagent Decision Making

RoboCup 2006: Robot Soccer World Cup X
Exploiting Additive Structure in Factored MDPs for Reinforcement Learning

Recent Advances in Reinforcement Learning
Factored value iteration converges

Acta Cybernetica
Factored temporal difference learning in the new ties environment

Acta Cybernetica
The adaptive k-meteorologists problem and its application to structure learning and feature selection in reinforcement learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Constraint relaxation in approximate linear programs

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Optimistic initialization and greediness lead to polynomial time learning in factored MDPs

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Approximate linear-programming algorithms for graph-based Markov decision processes

Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
Mean Field Approximation of the Policy Iteration Algorithm for Graph-based Markov Decision Processes

Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
Learning Representation and Control in Markov Decision Processes: New Frontiers

Foundations and Trends® in Machine Learning
Value-function-based transfer for reinforcement learning using structure mapping

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Using Homomorphisms to transfer options across continuous reinforcement learning domains

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Considering Unseen States as Impossible in Factored Reinforcement Learning

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Samuel meets Amarel: automating value function approximation using global state space analysis

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Planning and execution with phase transitions

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Towards exploiting duality in approximate linear programming for MDPs

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 4
Decentralized control of cooperative systems: categorization and complexity analysis

Journal of Artificial Intelligence Research
The first probabilistic track of the international planning competition

Journal of Artificial Intelligence Research
Approximate policy iteration with a policy language bias: solving relational Markov decision processes

Journal of Artificial Intelligence Research
Solving factored MDPs with hybrid state and action variables

Journal of Artificial Intelligence Research
Learning symbolic models of stochastic domains

Journal of Artificial Intelligence Research
First order decision diagrams for relational MDPs

Journal of Artificial Intelligence Research
Optimal and approximate Q-value functions for decentralized POMDPs

Journal of Artificial Intelligence Research
Topological value iteration algorithm for Markov decision processes

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
A decision-theoretic model of assistance

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
An MCMC approach to solving hybrid factored MDPs

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Investigations of continual computation

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
ReTrASE: integrating paradigms for approximate probabilistic planning

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Feature Article---Merging AI and OR to Solve High-Dimensional Stochastic Optimization Problems Using Approximate Dynamic Programming

INFORMS Journal on Computing
Regret-based reward elicitation for Markov decision processes

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
The Infinite Latent Events Model

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Automated large-scale control of gene regulatory networks

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Planning under Uncertainty for Robotic Tasks with Mixed Observability

International Journal of Robotics Research
Decision making with dynamically arriving information

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Approximate dynamic programming with affine ADDs

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Resource-driven mission-phasing techniques for constrained agents in stochastic environments

Journal of Artificial Intelligence Research
Distributed redundancy and robustness in complex systems

Journal of Computer and System Sciences
Symbolic bounded real-time dynamic programming

SBIA'10 Proceedings of the 20th Brazilian conference on Advances in artificial intelligence
Normative reasoning with an adaptive self-interested agent model based on Markov decision processes

IBERAMIA'10 Proceedings of the 12th Ibero-American conference on Advances in artificial intelligence
Optimizing coalition formation for tasks with dynamically evolving rewards and nondeterministic action effects

Autonomous Agents and Multi-Agent Systems
Efficient solutions to factored MDPs with imprecise transition probabilities

Artificial Intelligence
Towards proactive event-driven computing

Proceedings of the 5th ACM international conference on Distributed event-based system
A geometric approach to find nondominated policies to imprecise reward MDPs

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
Decision-theoretic planning with generalized first-order decision diagrams

Artificial Intelligence
Probabilistic relational planning with first order decision diagrams

Journal of Artificial Intelligence Research
A framework and a mean-field algorithm for the local control of spatial processes

International Journal of Approximate Reasoning
Robust Approximate Bilinear Programming for Value Function Approximation

The Journal of Machine Learning Research
Unifying nondeterministic and probabilistic planning through imprecise markov decision processes

IBERAMIA-SBIA'06 Proceedings of the 2nd international joint conference, and Proceedings of the 10th Ibero-American Conference on AI 18th Brazilian conference on Advances in Artificial Intelligence
Topological value iteration algorithms

Journal of Artificial Intelligence Research
Discovering hidden structure in factored MDPs

Artificial Intelligence
Using Normative Markov Decision Processes for evaluating electronic contracts

AI Communications
Proximity-based non-uniform abstractions for approximate planning

Journal of Artificial Intelligence Research
Recognizing internal states of other agents to anticipate and coordinate interactions

EUMAS'11 Proceedings of the 9th European conference on Multi-Agent Systems
Control design for specifications on stochastic hybrid systems

Proceedings of the 16th international conference on Hybrid systems: computation and control
Organizational design principles and techniques for decision-theoretic agents

Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Approximate solutions for factored Dec-POMDPs with many agents

Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Exploiting structure and utilizing agent-centric rewards to promote coordination in large multiagent systems

Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper addresses the problem of planning under uncertainty in large Markov Decision Processes (MDPs). Factored MDPs represent a complex state space using state variables and the transition model using a dynamic Bayesian network. This representation often allows an exponential reduction in the representation size of structured MDPs, but the complexity of exact solution algorithms for such MDPs can grow exponentially in the representation size. In this paper, we present two approximate solution algorithms that exploit structure in factored MDPs. Both use an approximate value function represented as a linear combination of basis functions, where each basis function involves only a small subset of the domain variables. A key contribution of this paper is that it shows how the basic operations of both algorithms can be performed efficiently in closed form, by exploiting both additive and context-specific structure in a factored MDP. A central element of our algorithms is a novel linear program decomposition technique, analogous to variable elimination in Bayesian networks, which reduces an exponentially large LP to a provably equivalent, polynomial-sized one. One algorithm uses approximate linear programming, and the second approximate dynamic programming. Our dynamic programming algorithm is novel in that it uses an approximation based on max-norm, a technique that more directly minimizes the terms that appear in error bounds for approximate MDP algorithms. We provide experimental results on problems with over 1040 states, demonstrating a promising indication of the scalability of our approach, and compare our algorithm to an existing state-of-the-art approach, showing, in some problems, exponential gains in computation time.