Greedy linear value-approximation for factored Markov decision processes

Authors:
Relu Patrascu;Pascal Poupart;Dale Schuurmans;Craig Boutilier;Carlos Guestrin
Affiliations:
University of Waterloo;University of Toronto;University of Waterloo;University of Toronto;Stanford University
Venue:
Eighteenth national conference on Artificial intelligence
Year:
2002

Citing 11
Cited 13

Complexity of finite-horizon Markov decision process problems

Journal of the ACM (JACM)
Stochastic dynamic programming with factored representations

Artificial Intelligence
Dynamic Programming and Optimal Control, Two Volume Set

Dynamic Programming and Optimal Control, Two Volume Set
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Computing Factored Value Functions for Policies in Structured MDPs

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Policy Iteration for Factored MDPs

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
The Linear Programming Approach to Approximate Dynamic Programming

Operations Research
Hierarchical reinforcement learning with the MAXQ value function decomposition

Journal of Artificial Intelligence Research
Nonapproximability results for partially observable Markov decision processes

Journal of Artificial Intelligence Research
Max-norm projections for factored MDPs

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 1

Resource allocation among agents with preferences induced by factored MDPs

AAMAS '06 Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
Practical solution techniques for first-order MDPs

Artificial Intelligence
Factored value iteration converges

Acta Cybernetica
Learning Representation and Control in Markov Decision Processes: New Frontiers

Foundations and Trends® in Machine Learning
Learning basis functions in hybrid domains

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Samuel meets Amarel: automating value function approximation using global state space analysis

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Towards exploiting duality in approximate linear programming for MDPs

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 4
Solving factored MDPs with hybrid state and action variables

Journal of Artificial Intelligence Research
Topological value iteration algorithm for Markov decision processes

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Approximate dynamic programming with affine ADDs

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Automatic induction of bellman-error features for probabilistic planning

Journal of Artificial Intelligence Research
Feature-Discovering approximate value iteration methods

SARA'05 Proceedings of the 6th international conference on Abstraction, Reformulation and Approximation
Topological value iteration algorithms

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Significant recent work has focused on using linear representations to approximate value functions for factored Markov decision processes (MDPs). Current research has adopted linear programming as an effective means to calculate approximations for a given set of basis functions, tackling very large MDPs as a result. However, a number of issues remain unresolved: How accurate are the approximations produced by linear programs? How hard is it to produce better approximations? and Where do the basis functions come from? To address these questions, we first investigate the complexity of minimizing the Bellman error of a linear value function approximation--showing that this is an inherently hard problem. Nevertheless, we provide a branch and bound method for calculating Bellman error and performing approximate policy iteration for general factored MDPs. These methods are more accurate than linear programming, but more expensive. We then consider linear programming itself and investigate methods for automatically constructing sets of basis functions that allow this approach to produce good approximations. The techniques we develop are guaranteed to reduce L1 error, but can also empirically reduce Bellman error.