Reinforcement Learning in Finite MDPs: PAC Analysis

Authors:
Alexander L. Strehl;Lihong Li;Michael L. Littman
Affiliations:
-;-;-
Venue:
The Journal of Machine Learning Research
Year:
2009

Citing 34
Cited 11

A theory of the learnable

Communications of the ACM
Introduction to algorithms

Introduction to algorithms
Technical Note: \cal Q-Learning

Machine Learning
Efficient reinforcement learning

COLT '94 Proceedings of the seventh annual conference on Computational learning theory
Artificial intelligence: a modern approach

Artificial intelligence: a modern approach
An Upper Bound on the Loss from Approximate Optimal-Value Functions

Machine Learning
The effect of representation and knowledge on goal-directed exploration with reinforcement-learning algorithms

Machine Learning - Special issue on reinforcement learning
The asymptotic convergence-rate of Q-learning

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Finite-sample convergence rates for Q-learning and indirect algorithms

Proceedings of the 1998 conference on Advances in neural information processing systems II
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Near-Optimal Reinforcement Learning in Polynomial Time

Machine Learning
Queries and Concept Learning

Machine Learning
Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm

Machine Learning
Queries and Concept Learning

Machine Learning
Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Rates of Convergence for Variable Resolution Schemes in Optimal Control

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
R-max - a general polynomial time algorithm for near-optimal reinforcement learning

The Journal of Machine Learning Research
Learning Rates for Q-learning

The Journal of Machine Learning Research
The Sample Complexity of Exploration in the Multi-Armed Bandit Problem

The Journal of Machine Learning Research
A theoretical analysis of Model-Based Interval Estimation

ICML '05 Proceedings of the 22nd international conference on Machine learning
An analytic solution to discrete Bayesian reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
PAC model-free reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
An analysis of model-based Interval Estimation for Markov Decision Processes

Journal of Computer and System Sciences
The adaptive k-meteorologists problem and its application to structure learning and feature selection in reinforcement learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Online exploration in least-squares policy iteration

Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Efficient reinforcement learning with relocatable action models

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Efficient structure learning in factored-state MDPs

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Potential-based shaping in model-based reinforcement learning

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Efficient reinforcement learning in factored MDPs

IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
Learning to act using real-time dynamic programming

Artificial Intelligence
A Bayesian sampling approach to exploration in reinforcement learning

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Exploring compact reinforcement-learning representations with linear regression

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
A unifying framework for computational reinforcement learning theory

A unifying framework for computational reinforcement learning theory

Reducing reinforcement learning to KWIK online regression

Annals of Mathematics and Artificial Intelligence
Extended spatial and temporal learning scale in reinforcement learning

CIMMACS '10 Proceedings of the 9th WSEAS international conference on computational intelligence, man-machine systems and cybernetics
A Monte-Carlo AIXI approximation

Journal of Artificial Intelligence Research
Robust bayesian reinforcement learning through tight lower bounds

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
V-MAX: tempered optimism for better PAC reinforcement learning

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
PAC bounds for discounted MDPs

ALT'12 Proceedings of the 23rd international conference on Algorithmic Learning Theory
Optimistic AIXI

AGI'12 Proceedings of the 5th international conference on Artificial General Intelligence
Dynamic policy programming

The Journal of Machine Learning Research
Exploration in relational domains for model-based reinforcement learning

The Journal of Machine Learning Research
Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model

Machine Learning
Scalable and efficient bayes-adaptive reinforcement learning based on monte-carlo tree search

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study the problem of learning near-optimal behavior in finite Markov Decision Processes (MDPs) with a polynomial number of samples. These "PAC-MDP" algorithms include the well-known E3 and R-MAX algorithms as well as the more recent Delayed Q-learning algorithm. We summarize the current state-of-the-art by presenting bounds for the problem in a unified theoretical framework. A more refined analysis for upper and lower bounds is presented to yield insight into the differences between the model-free Delayed Q-learning and the model-based R-MAX.