R-MAX: a general polynomial time algorithm for near-optimal reinforcement learning

Authors:
Ronen I. Brafman;Moshe Tennenholtz
Affiliations:
Computer Science Department, Ben-Gurion University, Beer-Sheva, Israel;Faculty of Industrial Engineering and Management, Technion, Haifa, Israel and Computer Science Department, Stanford University, Stanford, CA
Venue:
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Year:
2001

Citing 11
Cited 19

Integrated architecture for learning, planning, and reacting based on approximating dynamic programming

Proceedings of the seventh international conference (1990) on Machine learning
Learning in embedded systems

Learning in embedded systems
Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time

Machine Learning
Model-based average reward reinforcement learning

Artificial Intelligence
A near-optimal polynomial time algorithm for learning in certain classes of stochastic games

Artificial Intelligence
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Near-Optimal Reinforcement Learning in Polynominal Time

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Efficient Reinforcement Learning in Factored MDPs

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Reinforcement learning: a survey

Journal of Artificial Intelligence Research
Dynamic non-Bayesian decision making

Journal of Artificial Intelligence Research

Reinforcement Learning Agents

Artificial Intelligence Review
Dopamine: generalization and bonuses

Neural Networks - Computational models of neuromodulation
Control of exploitation-exploration meta-parameter in reinforcement learning

Neural Networks - Computational models of neuromodulation
Game Theory and Artificial Intelligence

Selected papers from the UKMAS Workshop on Foundations and Applications of Multi-Agent Systems
PAC Bounds for Multi-armed Bandit and Markov Decision Processes

COLT '02 Proceedings of the 15th Annual Conference on Computational Learning Theory
Polynomial-time reinforcement learning of near-optimal policies

Eighteenth national conference on Artificial intelligence
Nash q-learning for general-sum stochastic games

The Journal of Machine Learning Research
Efficient learning equilibrium

Artificial Intelligence
Bayesian sparse sampling for on-line reward optimization

ICML '05 Proceedings of the 22nd international conference on Machine learning
Model-based function approximation in reinforcement learning

Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
The many faces of optimism: a unifying approach

Proceedings of the 25th international conference on Machine learning
Optimistic initialization and greediness lead to polynomial time learning in factored MDPs

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Generalized model learning for reinforcement learning in factored domains

Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Using linear programming for Bayesian exploration in Markov decision processes

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Learning the behavior model of a robot

Autonomous Robots
Asymptotic learnability of reinforcement problems with arbitrary dependence

ALT'06 Proceedings of the 17th international conference on Algorithmic Learning Theory
Learning in one-shot strategic form games

ECML'06 Proceedings of the 17th European conference on Machine Learning
TEXPLORE: real-time sample-efficient reinforcement learning for robots

Machine Learning
Learning exploration strategies in model-based reinforcement learning

Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

R-MAX is a simple model-based reinforcement learning algorithm which can attain near-optimal average reward in polynomial time. In R-MAX, the agent always maintains a complete, but possibly inaccurate model of its environment and acts based on the optimal policy derived from this model. The model is initialized in an optimistic fashion: all actions in all states return the maximal possible reward (hence the name). During execution, the model is updated based on the agent's observations. R-MAX improves upon several previous algorithms: (1) It is simpler and more general than Kearns and Singh's E3 algorithm, covering zerosum stochastic games. (2) It has a built-in mechanism for resolving the exploration vs. exploitation dilemma. (3) It formally justifies the "optimism under uncertainty" bias used in many RL algorithms. (4) It is much simpler and more general than Brafman and Tennenholtz's LSG algorithmfor learning in single controller stochastic games. (5) It generalizes the algorithm by Monderer and Tennenholtz for learning in repeated games. (6) It is the only algorithm for near-optimal learning in repeated games known to be polynomial, providing a much simpler and more efficient alternative to previous algorithms by Banos and by Megiddo.