An analysis of model-based Interval Estimation for Markov Decision Processes

Authors:
Alexander L. Strehl;Michael L. Littman
Affiliations:
Yahoo! Inc, 701 First Avenue, Sunnyvale, California 94089, USA;Computer Science Department, Rutgers University, 110 Frelinghuysen Road, Piscataway, NJ 08854, USA
Venue:
Journal of Computer and System Sciences
Year:
2008

Citing 17
Cited 13

A theory of the learnable

Communications of the ACM
Learning in embedded systems

Learning in embedded systems
An introduction to computational learning theory

An introduction to computational learning theory
Efficient model-based exploration

Proceedings of the fifth international conference on simulation of adaptive behavior on From animals to animats 5
Bounded-parameter Markov decision process

Artificial Intelligence
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Reinforcement Learning

Reinforcement Learning
Near-Optimal Reinforcement Learning in Polynomial Time

Machine Learning
Exploration Control in Reinforcement Learning using Optimistic Model Selection

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Expected Mistake Bound Model for On-Line Reinforcement Learning

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
PAC Bounds for Multi-armed Bandit and Markov Decision Processes

COLT '02 Proceedings of the 15th Annual Conference on Computational Learning Theory
R-max - a general polynomial time algorithm for near-optimal reinforcement learning

The Journal of Machine Learning Research
Using confidence bounds for exploitation-exploration trade-offs

The Journal of Machine Learning Research
An Empirical Evaluation of Interval Estimation for Markov Decision Processes

ICTAI '04 Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence
A theoretical analysis of Model-Based Interval Estimation

ICML '05 Proceedings of the 22nd international conference on Machine learning
PAC model-free reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
A simple distribution-free approach to the max k-armed bandit problem

CP'06 Proceedings of the 12th international conference on Principles and Practice of Constraint Programming

The many faces of optimism: a unifying approach

Proceedings of the 25th international conference on Machine learning
Near-Bayesian exploration in polynomial time

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Reinforcement Learning in Finite MDPs: PAC Analysis

The Journal of Machine Learning Research
PAC-MDP learning with knowledge-based admissible models

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Near-optimal Regret Bounds for Reinforcement Learning

The Journal of Machine Learning Research
Uncertainty Propagation for Efficient Exploration in Reinforcement Learning

Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Efficient planning in R-max

The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
Asymptotically optimal agents

ALT'11 Proceedings of the 22nd international conference on Algorithmic learning theory
Robust bayesian reinforcement learning through tight lower bounds

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
V-MAX: tempered optimism for better PAC reinforcement learning

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
PAC bounds for discounted MDPs

ALT'12 Proceedings of the 23rd international conference on Algorithmic Learning Theory
Prior-free exploration bonus for and beyond near bayes-optimal behavior

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Monte-Carlo tree search for Bayesian reinforcement learning

Applied Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Several algorithms for learning near-optimal policies in Markov Decision Processes have been analyzed and proven efficient. Empirical results have suggested that Model-based Interval Estimation (MBIE) learns efficiently in practice, effectively balancing exploration and exploitation. This paper presents a theoretical analysis of MBIE and a new variation called MBIE-EB, proving their efficiency even under worst-case conditions. The paper also introduces a new performance metric, average loss, and relates it to its less ''online'' cousins from the literature.