PAC bounds for discounted MDPs

Authors:
Tor Lattimore;Marcus Hutter
Affiliations:
Australian National University, Australia;Australian National University, Australia
Venue:
ALT'12 Proceedings of the 23rd international conference on Algorithmic Learning Theory
Year:
2012

Citing 6
Cited 2

The Sample Complexity of Exploration in the Multi-Armed Bandit Problem

The Journal of Machine Learning Research
A theoretical analysis of Model-Based Interval Estimation

ICML '05 Proceedings of the 22nd international conference on Machine learning
PAC model-free reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
An analysis of model-based Interval Estimation for Markov Decision Processes

Journal of Computer and System Sciences
Reinforcement Learning in Finite MDPs: PAC Analysis

The Journal of Machine Learning Research
Near-optimal Regret Bounds for Reinforcement Learning

The Journal of Machine Learning Research

Optimistic agents are asymptotically optimal

AI'12 Proceedings of the 25th Australasian joint conference on Advances in Artificial Intelligence
Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model

Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finite-state discounted Markov Decision Processes (mdps). We prove a new bound for a modified version of Upper Confidence Reinforcement Learning (ucrl) with only cubic dependence on the horizon. The bound is unimprovable in all parameters except the size of the state/action space, where it depends linearly on the number of non-zero transition probabilities. The lower bound strengthens previous work by being both more general (it applies to all policies) and tighter. The upper and lower bounds match up to logarithmic factors provided the transition matrix is not too dense.