Efficient reinforcement learning
COLT '94 Proceedings of the seventh annual conference on Computational learning theory
Competitive Markov decision processes
Competitive Markov decision processes
Dynamic Programming and Optimal Control
Dynamic Programming and Optimal Control
Neuro-Dynamic Programming
Expected Mistake Bound Model for On-Line Reinforcement Learning
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Near-Optimal Reinforcement Learning in Polynominal Time
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
R-MAX: a general polynomial time algorithm for near-optimal reinforcement learning
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Efficient learning of multi-step best response
Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems
Perspectives on multiagent learning
Artificial Intelligence
Hi-index | 0.00 |
Inspired by recent results on polynomial time reinforcement algorithms that accumulate near-optimal rewards, we look at the related problem of quickly learning near-optimal policies. The new problem is obviously related to the previous one, but different in important ways. We provide simple algorithms for MDPs, zero-sum and common-payoff Stochastic Games, and a uniform framework for proving their polynomial complexity. Unlike the previously studied problem, these bounds use the minimum between the mixing time and a new quantity - the spectral radius. Unlike the previous results, our results apply uniformly to the average and discounted cases.