Piecewise linear value function approximation for factored MDPs

Authors:
Pascal Poupart;Craig Boutilier;Relu Patrascu;Dale Schuurmans
Affiliations:
Dept. of Computer Science, University of Toronto, Toronto, ON, M5S 3H5;Dept. of Computer Science, University of Toronto, Toronto, ON, M5S 3H5;Department of Computer Science, University of Waterloo, Waterloo, ON, N2L 3G1;Department of Computer Science, University of Waterloo, Waterloo, ON, N2L 3G1
Venue:
Eighteenth national conference on Artificial intelligence
Year:
2002

Citing 14
Cited 11

A model for reasoning about persistence and causation

Computational Intelligence
The Parti-game Algorithm for Variable Resolution Reinforcement Learning in Multidimensional State-spaces

Machine Learning
Feature-based methods for large scale dynamic programming

Machine Learning - Special issue on reinforcement learning
Abstraction and approximate decision-theoretic planning

Artificial Intelligence
Solving very large weakly coupled Markov decision processes

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
How to dynamically merge Markov decision processes

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Piecewise linear value function approximation for factored MDPs

Eighteenth national conference on Artificial intelligence
Prioritized goal decomposition of Markov decision processes: toward a synthesis of classical and decision theoretic planning

IJCAI'97 Proceedings of the Fifteenth international joint conference on Artifical intelligence - Volume 2
Max-norm projections for factored MDPs

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 1
Exploiting structure in policy construction

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
SPUDD: stochastic planning using decision diagrams

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Model reduction techniques for computing approximately optimal solutions for Markov decision processes

UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence

Piecewise linear value function approximation for factored MDPs

Eighteenth national conference on Artificial intelligence
Approximate dynamic programming in multi-skill call centers

WSC '05 Proceedings of the 37th conference on Winter simulation
APPSSAT: Approximate probabilistic planning using stochastic satisfiability

International Journal of Approximate Reasoning
Practical solution techniques for first-order MDPs

Artificial Intelligence
Factored value iteration converges

Acta Cybernetica
Functional value iteration for decision-theoretic planning with general utility functions

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Topological value iteration algorithm for Markov decision processes

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Approximate dynamic programming with affine ADDs

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Solving efficiently Decentralized MDPs with temporal and resource constraints

Autonomous Agents and Multi-Agent Systems
A framework and a mean-field algorithm for the local control of spatial processes

International Journal of Approximate Reasoning
Topological value iteration algorithms

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

A number of proposals have been put forth in recent years for the solution of Markov decision processes (MDPs) whose state (and sometimes action) spaces are factored. One recent class of methods involves linear value function approximation, where the optimal value function is assumed to be a linear combination of some set of basis functions, with the aim of finding suitable weights. While sophisticated techniques have been developed for finding the best approximation within this constrained space, few methods have been proposed for choosing a suitable basis set, or modifying it if solution quality is found wanting. We propose a general framework, and specific proposals, that address both of these questions. In particular, we examine weakly coupled MDPs where a number of subtasks can be viewed independently modulo resource constraints. We then describe methods for constructing a piecewise linear combination of the subtask value functions, using greedy decision tree techniques. We argue that this architecture is suitable for many types of MDPs whose combinatorics are determined largely by the existence multiple conflicting objectives.