Factored value iteration converges

Authors:
István Szita;András Lörincz
Affiliations:
Eötvös Loránd University, Hungary, Department of Information Systems;Eötvös Loránd University, Hungary, Department of Information Systems
Venue:
Acta Cybernetica
Year:
2008

Citing 20
Cited 4

Complexity of finding embeddings in a k-tree

SIAM Journal on Algebraic and Discrete Methods
Bucket elimination: a unifying framework for reasoning

Artificial Intelligence
Stochastic dynamic programming with factored representations

Artificial Intelligence
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Algorithm-Directed Exploration for Model-Based Reinforcement Learning in Factored MDPs

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Efficient Reinforcement Learning in Factored MDPs

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Policy Iteration for Factored MDPs

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
The size of MDP factored policies

Eighteenth national conference on Artificial intelligence
Greedy linear value-approximation for factored Markov decision processes

Eighteenth national conference on Artificial intelligence
Piecewise linear value function approximation for factored MDPs

Eighteenth national conference on Artificial intelligence
Reinforcement learning for factored markov decision processes

Reinforcement learning for factored markov decision processes
Least-squares policy iteration

The Journal of Machine Learning Research
On Constraint Sampling in the Linear Programming Approach to Approximate Dynamic Programming

Mathematics of Operations Research
Sampling algorithms for l2 regression and applications

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Fast Monte Carlo Algorithms for Matrices I: Approximating Matrix Multiplication

SIAM Journal on Computing
Efficient solution algorithms for factored MDPs

Journal of Artificial Intelligence Research
Solving factored MDPs with hybrid state and action variables

Journal of Artificial Intelligence Research
Generalizing plans to new environments in relational MDPs

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Exploiting structure in policy construction

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2

Factored temporal difference learning in the new ties environment

Acta Cybernetica
Optimistic initialization and greediness lead to polynomial time learning in factored MDPs

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Learning the states: a brain inspired neural model

AGI'11 Proceedings of the 4th international conference on Artificial general intelligence
Multiagent reinforcement learning model for the emergence of common property and transhumance in sub-saharan africa

ALA'09 Proceedings of the Second international conference on Adaptive and Learning Agents

Quantified Score

Hi-index	0.01

Visualization

Abstract

In this paper we propose a novel algorithm, factored value iteration (FVI), for the approximate solution of factored Markov decision processes (fMDPs). The traditional approximate value iteration algorithm is modified in two ways. For one, the least-squares projection operator is modified so that it does not increase max-norm, and thus preserves convergence. The other modification is that we uniformly sample polynomially many samples from the (exponentially large) state space. This way, the complexity of our algorithm becomes polynomial in the size of the fMDP description length. We prove that the algorithm is convergent. We also derive an upper bound on the difference between our approximate solution and the optimal one, and also on the error introduced by sampling. We analyse various projection operators with respect to their computation complexity and their convergence when combined with approximate value iteration.