Proceedings of the seventh international conference (1990) on Machine learning
Learning in embedded systems
Model-based average reward reinforcement learning
Artificial Intelligence
A near-optimal polynomial time algorithm for learning in certain classes of stochastic games
Artificial Intelligence
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Near-Optimal Reinforcement Learning in Polynominal Time
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Efficient Reinforcement Learning in Factored MDPs
IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Reinforcement learning: a survey
Journal of Artificial Intelligence Research
Dynamic non-Bayesian decision making
Journal of Artificial Intelligence Research
Artificial Intelligence Review
Dopamine: generalization and bonuses
Neural Networks - Computational models of neuromodulation
Control of exploitation-exploration meta-parameter in reinforcement learning
Neural Networks - Computational models of neuromodulation
Game Theory and Artificial Intelligence
Selected papers from the UKMAS Workshop on Foundations and Applications of Multi-Agent Systems
PAC Bounds for Multi-armed Bandit and Markov Decision Processes
COLT '02 Proceedings of the 15th Annual Conference on Computational Learning Theory
Polynomial-time reinforcement learning of near-optimal policies
Eighteenth national conference on Artificial intelligence
Nash q-learning for general-sum stochastic games
The Journal of Machine Learning Research
Efficient learning equilibrium
Artificial Intelligence
Bayesian sparse sampling for on-line reward optimization
ICML '05 Proceedings of the 22nd international conference on Machine learning
Model-based function approximation in reinforcement learning
Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
The many faces of optimism: a unifying approach
Proceedings of the 25th international conference on Machine learning
Optimistic initialization and greediness lead to polynomial time learning in factored MDPs
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Generalized model learning for reinforcement learning in factored domains
Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Using linear programming for Bayesian exploration in Markov decision processes
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Learning the behavior model of a robot
Autonomous Robots
Asymptotic learnability of reinforcement problems with arbitrary dependence
ALT'06 Proceedings of the 17th international conference on Algorithmic Learning Theory
Learning in one-shot strategic form games
ECML'06 Proceedings of the 17th European conference on Machine Learning
Learning exploration strategies in model-based reinforcement learning
Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Hi-index | 0.00 |
R-MAX is a simple model-based reinforcement learning algorithm which can attain near-optimal average reward in polynomial time. In R-MAX, the agent always maintains a complete, but possibly inaccurate model of its environment and acts based on the optimal policy derived from this model. The model is initialized in an optimistic fashion: all actions in all states return the maximal possible reward (hence the name). During execution, the model is updated based on the agent's observations. R-MAX improves upon several previous algorithms: (1) It is simpler and more general than Kearns and Singh's E3 algorithm, covering zerosum stochastic games. (2) It has a built-in mechanism for resolving the exploration vs. exploitation dilemma. (3) It formally justifies the "optimism under uncertainty" bias used in many RL algorithms. (4) It is much simpler and more general than Brafman and Tennenholtz's LSG algorithmfor learning in single controller stochastic games. (5) It generalizes the algorithm by Monderer and Tennenholtz for learning in repeated games. (6) It is the only algorithm for near-optimal learning in repeated games known to be polynomial, providing a much simpler and more efficient alternative to previous algorithms by Banos and by Megiddo.