Complexity of finite-horizon Markov decision process problems
Journal of the ACM (JACM)
Stochastic dynamic programming with factored representations
Artificial Intelligence
Dynamic Programming and Optimal Control, Two Volume Set
Dynamic Programming and Optimal Control, Two Volume Set
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Neuro-Dynamic Programming
Computing Factored Value Functions for Policies in Structured MDPs
IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Policy Iteration for Factored MDPs
UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
The Linear Programming Approach to Approximate Dynamic Programming
Operations Research
Hierarchical reinforcement learning with the MAXQ value function decomposition
Journal of Artificial Intelligence Research
Nonapproximability results for partially observable Markov decision processes
Journal of Artificial Intelligence Research
Max-norm projections for factored MDPs
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 1
Resource allocation among agents with preferences induced by factored MDPs
AAMAS '06 Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
Practical solution techniques for first-order MDPs
Artificial Intelligence
Factored value iteration converges
Acta Cybernetica
Learning Representation and Control in Markov Decision Processes: New Frontiers
Foundations and Trends® in Machine Learning
Learning basis functions in hybrid domains
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Samuel meets Amarel: automating value function approximation using global state space analysis
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Towards exploiting duality in approximate linear programming for MDPs
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 4
Solving factored MDPs with hybrid state and action variables
Journal of Artificial Intelligence Research
Topological value iteration algorithm for Markov decision processes
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Approximate dynamic programming with affine ADDs
Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Automatic induction of bellman-error features for probabilistic planning
Journal of Artificial Intelligence Research
Feature-Discovering approximate value iteration methods
SARA'05 Proceedings of the 6th international conference on Abstraction, Reformulation and Approximation
Topological value iteration algorithms
Journal of Artificial Intelligence Research
Hi-index | 0.00 |
Significant recent work has focused on using linear representations to approximate value functions for factored Markov decision processes (MDPs). Current research has adopted linear programming as an effective means to calculate approximations for a given set of basis functions, tackling very large MDPs as a result. However, a number of issues remain unresolved: How accurate are the approximations produced by linear programs? How hard is it to produce better approximations? and Where do the basis functions come from? To address these questions, we first investigate the complexity of minimizing the Bellman error of a linear value function approximation--showing that this is an inherently hard problem. Nevertheless, we provide a branch and bound method for calculating Bellman error and performing approximate policy iteration for general factored MDPs. These methods are more accurate than linear programming, but more expensive. We then consider linear programming itself and investigate methods for automatically constructing sets of basis functions that allow this approach to produce good approximations. The techniques we develop are guaranteed to reduce L1 error, but can also empirically reduce Bellman error.