Interactive value iteration for Markov decision processes with unknown rewards

  • Authors:
  • Paul Weng;Bruno Zanuttini

  • Affiliations:
  • UPMC, LIP6, France;University of Caen, France

  • Venue:
  • IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

To tackle the potentially hard task of defining the reward function in a Markov Decision Process, we propose a new approach, based on Value Iteration, which interweaves the elicitation and optimization phases. We assume that rewards whose numeric values are unknown can only be ordered, and that a tutor is present to help comparing sequences of rewards. We first show how the set of possible reward functions for a given preference relation can be represented as a polytope. Then our algorithm, called Interactive Value Iteration, searches for an optimal policy while refining its knowledge about the possible reward functions, by querying a tutor when necessary. We prove that the number of queries needed before finding an optimal policy is upperbounded by a polynomial in the size of the problem, and we present experimental results which demonstrate that our approach is efficient in practice.