Simple model-based exploration and exploitation of Markov decision processes using the elimination algorithm

Authors:
Elizabeth Novoa
Affiliations:
Departamento de Ingeniería Informática, Universidad de Santiago de Chile, Santiago, Chile
Venue:
MICAI'07 Proceedings of the artificial intelligence 6th Mexican international conference on Advances in artificial intelligence
Year:
2007

Citing 18
Cited 0

Associative Reinforcement Learning: Functions in k-DNF

Machine Learning
Bayesian Q-learning

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Learning in graphical models

Learning in graphical models
Complexity of finite-horizon Markov decision process problems

Journal of the ACM (JACM)
Bayesian Learning for Neural Networks

Bayesian Learning for Neural Networks
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes

Machine Learning
Near-Optimal Reinforcement Learning in Polynomial Time

Machine Learning
A Bayesian Framework for Reinforcement Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Artificial Intelligence: A Modern Approach

Artificial Intelligence: A Modern Approach
Optimal learning: computational procedures for bayes-adaptive markov decision processes

Optimal learning: computational procedures for bayes-adaptive markov decision processes
R-max - a general polynomial time algorithm for near-optimal reinforcement learning

The Journal of Machine Learning Research
Bayesian sparse sampling for on-line reward optimization

ICML '05 Proceedings of the 22nd international conference on Machine learning
An analytic solution to discrete Bayesian reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Nonapproximability results for partially observable Markov decision processes

Journal of Artificial Intelligence Research
Reinforcement learning: a survey

Journal of Artificial Intelligence Research
Using linear programming for Bayesian exploration in Markov decision processes

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Model based Bayesian exploration

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

The fundamental problemin learning and planning of Markov Decision Processes is how the agent explores and exploits an uncertain environment. The classical solutions to the problem are basically heuristics that lack appropriate theoretical justifications. As a result, principled solutions based on Bayesian estimation, though intractable even in small cases, have been recently investigated. The common approach is to approximate Bayesian estimation with sophisticated methods that cope the intractability of computing the Bayesian posterior. However, we notice that the complexity of these approximations still prevents their use as the long-term reward gain improvement seems to be diminished by the difficulties of implementation. In this work, we propose a deliberately simplistic model-based algorithm to show the benefits of Bayesian estimation when compared to classical model-free solutions. In particular, our agent combines several Markov Chains from its belief state and uses the matrix-based Elimination Algorithm to find the best action to take. We test our agent over the three standard problems Chain, Loop, and Maze, and find that it outperforms the classical Q-Learning with e-Greedy, Boltzmann, and Interval Estimation action selection heuristics.