Stochastic dynamic programming with factored representations
Artificial Intelligence
Near-Optimal Reinforcement Learning in Polynominal Time
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Efficient Reinforcement Learning in Factored MDPs
IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Policy Iteration for Factored MDPs
UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
The size of MDP factored policies
Eighteenth national conference on Artificial intelligence
A theoretical analysis of Model-Based Interval Estimation
ICML '05 Proceedings of the 22nd international conference on Machine learning
The many faces of optimism: a unifying approach
Proceedings of the 25th international conference on Machine learning
Factored value iteration converges
Acta Cybernetica
Optimistic initialization and greediness lead to polynomial time learning in factored MDPs
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Efficient structure learning in factored-state MDPs
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Efficient solution algorithms for factored MDPs
Journal of Artificial Intelligence Research
R-MAX: a general polynomial time algorithm for near-optimal reinforcement learning
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Exploiting structure in policy construction
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Optimistic initialization and greediness lead to polynomial time learning in factored MDPs
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
AGI architecture measures human parameters and optimizes human performance
AGI'11 Proceedings of the 4th international conference on Artificial general intelligence
Hi-index | 0.00 |
In this paper we propose an algorithm for polynomial-time reinforcement learning in factored Markov decision processes (FMDPs). The factored optimistic initial model (FOIM) algorithm, maintains an empirical model of the FMDP in a conventional way, and always follows a greedy policy with respect to its model. The only trick of the algorithm is that the model is initialized optimistically. We prove that with suitable initialization (i) FOIM converges to the fixed point of approximate value iteration (AVI); (ii) the number of steps when the agent makes non-near-optimal decisions (with respect to the solution of AVI) is polynomial in all relevant quantities; (iii) the per-step costs of the algorithm are also polynomial. To our best knowledge, FOIM is the first algorithm with these properties.