Active learning of MDP models

  • Authors:
  • Mauricio Araya-López;Olivier Buffet;Vincent Thomas;François Charpillet

  • Affiliations:
  • Nancy Université / INRIA LORIA, Vandoeuvre-lès-Nancy Cedex, France;Nancy Université / INRIA LORIA, Vandoeuvre-lès-Nancy Cedex, France;Nancy Université / INRIA LORIA, Vandoeuvre-lès-Nancy Cedex, France;Nancy Université / INRIA LORIA, Vandoeuvre-lès-Nancy Cedex, France

  • Venue:
  • EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider the active learning problem of inferring the transition model of a Markov Decision Process by acting and observing transitions. This is particularly useful when no reward function is a priori defined. Our proposal is to cast the active learning task as a utility maximization problem using Bayesian reinforcement learning with belief-dependent rewards. After presenting three possible performance criteria, we derive from them the belief-dependent rewards to be used in the decision-making process. As computing the optimal Bayesian value function is intractable for large horizons, we use a simple algorithm to approximately solve this optimization problem. Despite the sub-optimality of this technique, we show experimentally that our proposal is efficient in a number of domains.