A theoretical analysis of Model-Based Interval Estimation

Authors:
Alexander L. Strehl;Michael L. Littman
Affiliations:
Rutgers University, Piscataway, NJ;Rutgers University, Piscataway, NJ
Venue:
ICML '05 Proceedings of the 22nd international conference on Machine learning
Year:
2005

Citing 8
Cited 24

Learning in embedded systems

Learning in embedded systems
Efficient model-based exploration

Proceedings of the fifth international conference on simulation of adaptive behavior on From animals to animats 5
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Near-Optimal Reinforcement Learning in Polynomial Time

Machine Learning
Expected Mistake Bound Model for On-Line Reinforcement Learning

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
R-max - a general polynomial time algorithm for near-optimal reinforcement learning

The Journal of Machine Learning Research
An Empirical Evaluation of Interval Estimation for Markov Decision Processes

ICTAI '04 Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence

PAC model-free reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Percentile optimization in uncertain Markov decision processes with application to efficient exploration

Proceedings of the 24th international conference on Machine learning
Active reinforcement learning

Proceedings of the 25th international conference on Machine learning
The many faces of optimism: a unifying approach

Proceedings of the 25th international conference on Machine learning
Expediting RL by using graphical structures

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 3
Pessimistic cost-sensitive active learning of decision trees for profit maximizing targeting campaigns

Data Mining and Knowledge Discovery
An analysis of model-based Interval Estimation for Markov Decision Processes

Journal of Computer and System Sciences
Optimism in the Face of Uncertainty Should be Refutable

Minds and Machines
Efficient Reinforcement Learning in Parameterized Models: Discrete Parameter Case

Recent Advances in Reinforcement Learning
Optimistic initialization and greediness lead to polynomial time learning in factored MDPs

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Using linear programming for Bayesian exploration in Markov decision processes

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Reinforcement Learning in Finite MDPs: PAC Analysis

The Journal of Machine Learning Research
Bounded parameter Markov decision processes with average reward criterion

COLT'07 Proceedings of the 20th annual conference on Learning theory
REGAL: a regularization based algorithm for reinforcement learning in weakly communicating MDPs

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Critical factors in the empirical performance of temporal difference and evolutionary methods for reinforcement learning

Autonomous Agents and Multi-Agent Systems
Near-optimal Regret Bounds for Reinforcement Learning

The Journal of Machine Learning Research
Reducing reinforcement learning to KWIK online regression

Annals of Mathematics and Artificial Intelligence
A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes

The Journal of Machine Learning Research
Exploiting Best-Match Equations for Efficient Reinforcement Learning

The Journal of Machine Learning Research
V-MAX: tempered optimism for better PAC reinforcement learning

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
PAC bounds for discounted MDPs

ALT'12 Proceedings of the 23rd international conference on Algorithmic Learning Theory
Performance Guarantees for Empirical Markov Decision Processes with Applications to Multiperiod Inventory Models

Operations Research
Optimistic agents are asymptotically optimal

AI'12 Proceedings of the 25th Australasian joint conference on Advances in Artificial Intelligence
TEXPLORE: real-time sample-efficient reinforcement learning for robots

Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Several algorithms for learning near-optimal policies in Markov Decision Processes have been analyzed and proven efficient. Empirical results have suggested that Model-based Interval Estimation (MBIE) learns efficiently in practice, effectively balancing exploration and exploitation. This paper presents the first theoretical analysis of MBIE, proving its efficiency even under worst-case conditions. The paper also introduces a new performance metric, average loss, and relates it to its less "online" cousins from the literature.