Max-norm projections for factored MDPs

Authors:
Carlos Guestrin;Daphne Koller;Ronald Parr
Affiliations:
Computer Science Dept., Stanford University;Computer Science Dept., Stanford University;Computer Science Dept., Duke University
Venue:
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 1
Year:
2001

Citing 9
Cited 30

A model for reasoning about persistence and causation

Computational Intelligence
Feature-based methods for large scale dynamic programming

Machine Learning - Special issue on reinforcement learning
The sciences of the artificial (3rd ed.)

The sciences of the artificial (3rd ed.)
Abstraction and approximate decision-theoretic planning

Artificial Intelligence
Bucket elimination: a unifying framework for reasoning

Artificial Intelligence
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Nonserial Dynamic Programming

Nonserial Dynamic Programming
Computing Factored Value Functions for Policies in Structured MDPs

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Policy Iteration for Factored MDPs

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence

Context-specific multiagent coordination and planning with factored MDPs

Eighteenth national conference on Artificial intelligence
On policy iteration as a Newton's method and polynomial policy iteration algorithms

Eighteenth national conference on Artificial intelligence
Greedy linear value-approximation for factored Markov decision processes

Eighteenth national conference on Artificial intelligence
Piecewise linear value function approximation for factored MDPs

Eighteenth national conference on Artificial intelligence
Efficient max-norm distance computation and reliable voxelization

Proceedings of the 2003 Eurographics/ACM SIGGRAPH symposium on Geometry processing
Multi-Agent Planning in Complex Uncertain Environments

AAMAS '04 Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 1
Solving factored MDPs with continuous and discrete variables

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
A causal approach to hierarchical decomposition of factored MDPs

ICML '05 Proceedings of the 22nd international conference on Machine learning
Constraint-based optimization and utility elicitation using the minimax decision criterion

Artificial Intelligence
Causal Graph Based Decomposition of Factored MDPs

The Journal of Machine Learning Research
APPSSAT: Approximate probabilistic planning using stochastic satisfiability

International Journal of Approximate Reasoning
Continuous State Dynamic Programming via Nonexpansive Approximation

Computational Economics
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path

Machine Learning
Error bounds for approximate value iteration

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Efficient solution algorithms for factored MDPs

Journal of Artificial Intelligence Research
Solving factored MDPs with hybrid state and action variables

Journal of Artificial Intelligence Research
Planning with durative actions in stochastic domains

Journal of Artificial Intelligence Research
An MCMC approach to solving hybrid factored MDPs

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Affine algebraic decision diagrams (AADDs) and their application to structured probabilistic inference

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Constraint-based optimization and utility elicitation using the minimax decision criterion

Artificial Intelligence
Active learning of dynamic Bayesian networks in Markov decision processes

SARA'07 Proceedings of the 7th International conference on Abstraction, reformulation, and approximation
Automatic induction of bellman-error features for probabilistic planning

Journal of Artificial Intelligence Research
Polynomial value iteration algorithms for deterministic MDPs

UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
Reinforcement learning with partially known world dynamics

UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
Inductive policy selection for first-order MDPs

UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
Monte-Carlo optimizations for resource allocation problems in stochastic network systems

UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence
Learning near-optimal policies with bellman-residual minimization based fitted policy iteration and a single sample path

COLT'06 Proceedings of the 19th annual conference on Learning Theory
Decentralized Bayesian reinforcement learning for online agent collaboration

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Rational market making with probabilistic knowledge

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Construction of approximation spaces for reinforcement learning

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Markov Decision Processes (MDPs) provide a coherent mathematical framework for planning under uncertainty. However, exact MDP solution algorithms require the manipulation of a value function, which specifies a value for each state in the system. Most real-world MDPs are too large for such a representation to be feasible, preventing the use of exact MDP algorithms. Various approximate solution algorithms have been proposed, many of which use a linear combination of basis functions as a compact approximation to the value function. Almost all of these algorithms use an approximation based on the (weighted) L2-norm (Euclidean distance); this approach prevents the application of standard convergence results for MDP algorithms, all of which are based on max-norm. This paper makes two contributions. First, it presents the first approximate MDP solution algorithms - both value and policy iteration - that use max-norm projection, thereby directly optimizing the quantity required to obtain the best error bounds. Second, it shows how these algorithms can be applied efficiently in the context of factored MDPs, where the transition model is specified using a dynamic Bayesian network.