R-max - a general polynomial time algorithm for near-optimal reinforcement learning

Authors:
Ronen I. Brafman;Moshe Tennenholtz
Affiliations:
Computer Science Department, Ben-Gurion University, Beer-Sheva, Israel 84105;Computer Science Department, Stanford University, Stanford, CA
Venue:
The Journal of Machine Learning Research
Year:
2003

Citing 12
Cited 98

Integrated architecture for learning, planning, and reacting based on approximating dynamic programming

Proceedings of the seventh international conference (1990) on Machine learning
Learning in embedded systems

Learning in embedded systems
Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time

Machine Learning
Model-based average reward reinforcement learning

Artificial Intelligence
A near-optimal polynomial time algorithm for learning in certain classes of stochastic games

Artificial Intelligence
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Near-Optimal Reinforcement Learning in Polynominal Time

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Efficient Reinforcement Learning in Factored MDPs

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
A Generalized Reinforcement-Learning Model: Convergence and Applications

A Generalized Reinforcement-Learning Model: Convergence and Applications
Reinforcement learning: a survey

Journal of Artificial Intelligence Research
Dynamic non-Bayesian decision making

Journal of Artificial Intelligence Research

A Geometric Approach to Multi-Criterion Reinforcement Learning

The Journal of Machine Learning Research
Using relative novelty to identify useful temporal abstractions in reinforcement learning

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Efficient learning equilibrium

Artificial Intelligence
Efficient learning of multi-step best response

Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems
Exploration and apprenticeship learning in reinforcement learning

ICML '05 Proceedings of the 22nd international conference on Machine learning
A theoretical analysis of Model-Based Interval Estimation

ICML '05 Proceedings of the 22nd international conference on Machine learning
PAC model-free reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
A hierarchical approach to efficient reinforcement learning in deterministic domains

AAMAS '06 Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents

Machine Learning
If multi-agent learning is the answer, what is the question?

Artificial Intelligence
Perspectives on multiagent learning

Artificial Intelligence
Percentile optimization in uncertain Markov decision processes with application to efficient exploration

Proceedings of the 24th international conference on Machine learning
Guiding exploration by pre-existing knowledge without modifying reward

Neural Networks
Generalized multiagent learning with performance bound

Autonomous Agents and Multi-Agent Systems
Active reinforcement learning

Proceedings of the 25th international conference on Machine learning
Hierarchical model-based reinforcement learning: R-max + MAXQ

Proceedings of the 25th international conference on Machine learning
Knows what it knows: a framework for self-aware learning

Proceedings of the 25th international conference on Machine learning
The utility of temporal abstraction in reinforcement learning

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 1
Expediting RL by using graphical structures

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 3
Planning and Learning in Environments with Delayed Feedback

ECML '07 Proceedings of the 18th European conference on Machine Learning
Online Multiagent Learning against Memory Bounded Adversaries

ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Transferring Instances for Model-Based Reinforcement Learning

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
An analysis of model-based Interval Estimation for Markov Decision Processes

Journal of Computer and System Sciences
Optimism in the Face of Uncertainty Should be Refutable

Minds and Machines
Efficient Reinforcement Learning in Parameterized Models: Discrete Parameter Case

Recent Advances in Reinforcement Learning
Markov Decision Processes with Arbitrary Reward Processes

Recent Advances in Reinforcement Learning
Learning and planning in environments with delayed feedback

Autonomous Agents and Multi-Agent Systems
QL2, a simple reinforcement learning scheme for two-player zero-sum Markov games

Neurocomputing
The adaptive k-meteorologists problem and its application to structure learning and feature selection in reinforcement learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Near-Bayesian exploration in polynomial time

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Learning equilibria in repeated congestion games

Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Online exploration in least-squares policy iteration

Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
As Safe As It Gets: Near-Optimal Learning in Multi-Stage Games with Imperfect Monitoring

Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
Performance bounded reinforcement learning in strategic interactions

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Dynamic programming for partially observable stochastic games

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Markov Decision Processes with Arbitrary Reward Processes

Mathematics of Operations Research
Compositional Models for Reinforcement Learning

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Optimal efficient learning equilibrium: imperfect monitoring in symmetric games

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Learning equilibrium in resource selection games

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
A reinforcement learning algorithm with polynomial interaction complexity for only-costly-observable MDPs

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Efficient reinforcement learning with relocatable action models

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Efficient structure learning in factored-state MDPs

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Potential-based shaping in model-based reinforcement learning

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Learning to Coordinate Efficiently: a model-based approach

Journal of Artificial Intelligence Research
Online learning in Markov decision processes with arbitrarily changing rewards and transitions

GameNets'09 Proceedings of the First ICST international conference on Game Theory for Networks
Censored exploration and the dark pool problem

Communications of the ACM
2010 Special Issue: Online learning of shaping rewards in reinforcement learning

Neural Networks
Provably Efficient Learning with Typed Parametric Models

The Journal of Machine Learning Research
Reinforcement Learning in Finite MDPs: PAC Analysis

The Journal of Machine Learning Research
Bounded parameter Markov decision processes with average reward criterion

COLT'07 Proceedings of the 20th annual conference on Learning theory
Model-based exploration in continuous state spaces

SARA'07 Proceedings of the 7th International conference on Abstraction, reformulation, and approximation
Efficient exploration through active learning for value function approximation in reinforcement learning

Neural Networks
Simple model-based exploration and exploitation of Markov decision processes using the elimination algorithm

MICAI'07 Proceedings of the artificial intelligence 6th Mexican international conference on Advances in artificial intelligence
A Bayesian sampling approach to exploration in reinforcement learning

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
REGAL: a regularization based algorithm for reinforcement learning in weakly communicating MDPs

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Censored exploration and the Dark Pool Problem

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Exploring compact reinforcement-learning representations with linear regression

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Improving optimistic exploration in model-free reinforcement learning

ICANNGA'09 Proceedings of the 9th international conference on Adaptive and natural computing algorithms
Critical factors in the empirical performance of temporal difference and evolutionary methods for reinforcement learning

Autonomous Agents and Multi-Agent Systems
PAC-MDP learning with knowledge-based admissible models

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Model-based direct policy search

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Near-optimal Regret Bounds for Reinforcement Learning

The Journal of Machine Learning Research
Uncertainty Propagation for Efficient Exploration in Reinforcement Learning

Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Adaptive ε-greedy exploration in reinforcement learning based on value differences

KI'10 Proceedings of the 33rd annual German conference on Advances in artificial intelligence
Gaussian processes for sample efficient reinforcement learning with RMAX-like exploration

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
Exploration in relational worlds

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Reducing reinforcement learning to KWIK online regression

Annals of Mathematics and Artificial Intelligence
Empowerment for continuous agent-environment systems

Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems
A Monte-Carlo AIXI approximation

Journal of Artificial Intelligence Research
A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes

The Journal of Machine Learning Research
Exploiting Best-Match Equations for Efficient Reinforcement Learning

The Journal of Machine Learning Research
Efficient planning in R-max

The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
Models for autonomously motivated exploration in reinforcement learning

ALT'11 Proceedings of the 22nd international conference on Algorithmic learning theory
Hierarchical Knowledge Gradient for Sequential Sampling

The Journal of Machine Learning Research
Towards finite-sample convergence of direct reinforcement learning

ECML'05 Proceedings of the 16th European conference on Machine Learning
An assessment of strategies for choosing between competitive marketplaces

Electronic Commerce Research and Applications
Abstraction and generalization in reinforcement learning: a summary and framework

ALA'09 Proceedings of the Second international conference on Adaptive and Learning Agents
Statistical estimation with bounded memory

Statistics and Computing
Active learning of MDP models

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Handling ambiguous effects in action learning

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Feature reinforcement learning in practice

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Robust bayesian reinforcement learning through tight lower bounds

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Active malware analysis using stochastic games

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
V-MAX: tempered optimism for better PAC reinforcement learning

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Just add Pepper: extending learning algorithms for repeated matrix games to repeated Markov games

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Integrating a partial model into model free reinforcement learning

The Journal of Machine Learning Research
Bayes-optimal reinforcement learning for discrete uncertainty domains

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
Adaptive planning for markov decision processes with uncertain transition models via incremental feature dependency discovery

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Observer effect from stateful resources in agent sensing

Autonomous Agents and Multi-Agent Systems
Smart exploration in reinforcement learning using absolute temporal difference errors

Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Cooperating with a markovian ad hoc teammate

Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Exploration in relational domains for model-based reinforcement learning

The Journal of Machine Learning Research
Reinforcement learning in robotics: A survey

International Journal of Robotics Research
Prior-free exploration bonus for and beyond near bayes-optimal behavior

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Monte-Carlo tree search for Bayesian reinforcement learning

Applied Intelligence
Intelligent Cooperative Control Architecture: A Framework for Performance Improvement Using Safe Learning

Journal of Intelligent and Robotic Systems
MineralMiner: An active sensing simulation environment

Multiagent and Grid Systems
Scalable and efficient bayes-adaptive reinforcement learning based on monte-carlo tree search

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.03

Visualization

Abstract

R-MAX is a very simple model-based reinforcement learning algorithm which can attain near-optimal average reward in polynomial time. In R-MAX, the agent always maintains a complete, but possibly inaccurate model of its environment and acts based on the optimal policy derived from this model. The model is initialized in an optimistic fashion: all actions in all states return the maximal possible reward (hence the name). During execution, it is updated based on the agent's observations. R-MAX improves upon several previous algorithms: (1) It is simpler and more general than Kearns and Singh's E3 algorithm, covering zero-sum stochastic games. (2) It has a built-in mechanism for resolving the exploration vs. exploitation dilemma. (3) It formally justifies the ``optimism under uncertainty'' bias used in many RL algorithms. (4) It is simpler, more general, and more efficient than Brafman and Tennenholtz's LSG algorithm for learning in single controller stochastic games. (5) It generalizes the algorithm by Monderer and Tennenholtz for learning in repeated games. (6) It is the only algorithm for learning in repeated games, to date, which is provably efficient, considerably improving and simplifying previous algorithms by Banos and by Megiddo.