Near-Optimal Reinforcement Learning in Polynomial Time

Authors:
Michael Kearns;Satinder Singh
Affiliations:
Department of Computer and Information Science, University of Pennsylvania, Moore School Building, 200 South 33rd Street, Philadelphia, PA 19104-6389, USA. mkearns@cis.upenn.edu;Syntek Capital, New York, NY 10019, USA. satinder.baveja@syntekcapital.com
Venue:
Machine Learning
Year:
2002

Citing 21
Cited 65

Dynamic programming: deterministic and stochastic models

Dynamic programming: deterministic and stochastic models
Stochastic systems: estimation, identification and adaptive control

Stochastic systems: estimation, identification and adaptive control
Parallel and distributed computation: numerical methods

Parallel and distributed computation: numerical methods
Sequential decision problems and neural networks

Advances in neural information processing systems 2
Technical Note: \cal Q-Learning

Machine Learning
Algorithms for random generation and counting: a Markov chain approach

Algorithms for random generation and counting: a Markov chain approach
Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time

Machine Learning
Efficient reinforcement learning

COLT '94 Proceedings of the seventh annual conference on Computational learning theory
Asynchronous Stochastic Approximation and Q-Learning

Machine Learning
Feature-based methods for large scale dynamic programming

Machine Learning - Special issue on reinforcement learning
Reinforcement learning with replacing eligibility traces

Machine Learning - Special issue on reinforcement learning
Learning curve bounds for a Markov decision process with undiscounted rewards

COLT '96 Proceedings of the ninth annual conference on Computational learning theory
Learning policies for partially observable environments: scaling up

Readings in agents
Analytical Mean Squared Error Curves for Temporal DifferenceLearning

Machine Learning
Convergence Results for Single-Step On-PolicyReinforcement-Learning Algorithms

Machine Learning
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Expected Mistake Bound Model for On-Line Reinforcement Learning

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Efficient Reinforcement Learning in Factored MDPs

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence

Bias and variance in value function estimation

ICML '04 Proceedings of the twenty-first international conference on Machine learning
P3VI: a partitioned, prioritized, parallel value iterator

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Exploration and apprenticeship learning in reinforcement learning

ICML '05 Proceedings of the 22nd international conference on Machine learning
A theoretical analysis of Model-Based Interval Estimation

ICML '05 Proceedings of the 22nd international conference on Machine learning
PAC model-free reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
A hierarchical approach to efficient reinforcement learning in deterministic domains

AAMAS '06 Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
Combining expert advice in reactive environments

Journal of the ACM (JACM)
A Cost-Shaping Linear Program for Average-Cost Approximate Dynamic Programming with Performance Guarantees

Mathematics of Operations Research
Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems

The Journal of Machine Learning Research
Efficient PAC Learning for Episodic Tasks with Acyclic State Spaces

Discrete Event Dynamic Systems
Adaptive query processing

Foundations and Trends in Databases
Active reinforcement learning

Proceedings of the 25th international conference on Machine learning
Knows what it knows: a framework for self-aware learning

Proceedings of the 25th international conference on Machine learning
The many faces of optimism: a unifying approach

Proceedings of the 25th international conference on Machine learning
Expediting RL by using graphical structures

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 3
Online Regret Bounds for Markov Decision Processes with Deterministic Transitions

ALT '08 Proceedings of the 19th international conference on Algorithmic Learning Theory
Value Function Based Reinforcement Learning in Changing Markovian Environments

The Journal of Machine Learning Research
An analysis of model-based Interval Estimation for Markov Decision Processes

Journal of Computer and System Sciences
Efficient Reinforcement Learning in Parameterized Models: Discrete Parameter Case

Recent Advances in Reinforcement Learning
Reinforcement Learning: A Tutorial Survey and Recent Advances

INFORMS Journal on Computing
The adaptive k-meteorologists problem and its application to structure learning and feature selection in reinforcement learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Near-Bayesian exploration in polynomial time

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Online exploration in least-squares policy iteration

Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Online Markov Decision Processes

Mathematics of Operations Research
Efficient reinforcement learning with relocatable action models

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Efficient structure learning in factored-state MDPs

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
An experts algorithm for transfer learning

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Customized learning algorithms for episodic tasks withacyclic state spaces

CASE'09 Proceedings of the fifth annual IEEE international conference on Automation science and engineering
Censored exploration and the dark pool problem

Communications of the ACM
2010 Special Issue: Online learning of shaping rewards in reinforcement learning

Neural Networks
Provably Efficient Learning with Typed Parametric Models

The Journal of Machine Learning Research
Reinforcement Learning in Finite MDPs: PAC Analysis

The Journal of Machine Learning Research
Efficient exploration through active learning for value function approximation in reinforcement learning

Neural Networks
Simple model-based exploration and exploitation of Markov decision processes using the elimination algorithm

MICAI'07 Proceedings of the artificial intelligence 6th Mexican international conference on Advances in artificial intelligence
A Bayesian sampling approach to exploration in reinforcement learning

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
REGAL: a regularization based algorithm for reinforcement learning in weakly communicating MDPs

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Censored exploration and the Dark Pool Problem

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Online regret bounds for Markov decision processes with deterministic transitions

Theoretical Computer Science
Critical factors in the empirical performance of temporal difference and evolutionary methods for reinforcement learning

Autonomous Agents and Multi-Agent Systems
PAC-MDP learning with knowledge-based admissible models

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Game theory for cyber security

Proceedings of the Sixth Annual Workshop on Cyber Security and Information Intelligence Research
Near-optimal Regret Bounds for Reinforcement Learning

The Journal of Machine Learning Research
Exploration in relational worlds

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Reducing reinforcement learning to KWIK online regression

Annals of Mathematics and Artificial Intelligence
Non-deterministic policies in Markovian decision processes

Journal of Artificial Intelligence Research
Exploiting Best-Match Equations for Efficient Reinforcement Learning

The Journal of Machine Learning Research
Efficient planning in R-max

The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
Hierarchical Knowledge Gradient for Sequential Sampling

The Journal of Machine Learning Research
Reinforcement learning and apprenticeship learning for robotic control

ALT'06 Proceedings of the 17th international conference on Algorithmic Learning Theory
Book reviews: Self-learning control of finite Markov chains

Automatica (Journal of IFAC)
Planning under partial observability by classical replanning: theory and experiments

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Replanning in domains with partial information and sensing actions

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Handling ambiguous effects in action learning

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
V-MAX: tempered optimism for better PAC reinforcement learning

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Integrating a partial model into model free reinforcement learning

The Journal of Machine Learning Research
Bayes-optimal reinforcement learning for discrete uncertainty domains

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
Active learning of relational action models

ILP'11 Proceedings of the 21st international conference on Inductive Logic Programming
A sampled fictitious play based learning algorithm for infinite horizon Markov decision processes

Proceedings of the Winter Simulation Conference
Performance Guarantees for Empirical Markov Decision Processes with Applications to Multiperiod Inventory Models

Operations Research
Replanning in domains with partial information and sensing actions

Journal of Artificial Intelligence Research
Smart exploration in reinforcement learning using absolute temporal difference errors

Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
On Potential Cognitive Abilities in the Machine Kingdom

Minds and Machines
Exploration in relational domains for model-based reinforcement learning

The Journal of Machine Learning Research
Reinforcement learning in robotics: A survey

International Journal of Robotics Research
Monte-Carlo tree search for Bayesian reinforcement learning

Applied Intelligence

Quantified Score

Hi-index	0.02

Visualization

Abstract

We present new algorithms for reinforcement learning and prove that they have polynomial bounds on the resources required to achieve near-optimal return in general Markov decision processes. After observing that the number of actions required to approach the optimal return is lower bounded by the mixing time T of the optimal policy (in the undiscounted case) or by the horizon time T (in the discounted case), we then give algorithms requiring a number of actions and total computation time that are only polynomial in T and the number of states and actions, for both the undiscounted and discounted cases. An interesting aspect of our algorithms is their explicit handling of the Exploration-Exploitation trade-off.