Computing factored value functions for policies in structured MDPs

Authors:
Daphne Koller;Ronald Parr
Affiliations:
Computer Science Department, Stanford University;Computer Science Department, Stanford University
Venue:
IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
Year:
1999

Citing 7
Cited 11

Feature-based methods for large scale dynamic programming

Machine Learning - Special issue on reinforcement learning
Solving very large weakly coupled Markov decision processes

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
How to dynamically merge Markov decision processes

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Improved switching among temporally abstract actions

Proceedings of the 1998 conference on Advances in neural information processing systems II
Learning and value function approximation in complex decision processes

Learning and value function approximation in complex decision processes
Tractable inference for complex stochastic processes

UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
Model reduction techniques for computing approximately optimal solutions for Markov decision processes

UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence

Spatiotemporal Abstraction of Stochastic Sequential Processes

Proceedings of the 5th International Symposium on Abstraction, Reformulation and Approximation
Exploiting Additive Structure in Factored MDPs for Reinforcement Learning

Recent Advances in Reinforcement Learning
Model-based least-squares policy evaluation

AI'03 Proceedings of the 16th Canadian society for computational studies of intelligence conference on Advances in artificial intelligence
Distributed planning in hierarchical factored MDPs

UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
Policy iteration for factored MDPs

UAI'00 Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence
Decision-theoretic planning with concurrent temporally extended actions

UAI'01 Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence
APPSSAT: approximate probabilistic planning using stochastic satisfiability

ECSQARU'05 Proceedings of the 8th European conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty
Scalable multiagent planning using probabilistic inference

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Approximate solutions for factored Dec-POMDPs with many agents

Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Automated generation of interaction graphs for value-factored dec-POMDPs

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Construction of approximation spaces for reinforcement learning

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many large Markov decision processes (MDPs) can be represented compactly using a structured representation such as a dynamic Bayesian network. Unfortunately, the compact representation does not help standard MDP algorithms, because the value function for the MDP does not retain the structure of the process description. We argue that in many such MDPs, structure is approximately retained. That is, the value functions are nearly additive: closely approximated by a linear function over factors associated with small subsets of problem features. Based on this idea, we present a convergent, approximate value determination algorithm for structured MDPs. The algorithm maintains an additive value function, alternating dynamic programming steps with steps that project the result back into the restricted space of additive functions. We show that both the dynamic programming and the projection steps can be computed efficiently, despite the fact that the number of states is exponential in the number of state variables.