Multi-value-functions: efficient automatic action hierarchies for multiple goal MDPs

  • Authors:
  • Andrew W. Moore;Leemon C. Baird;Leslie Kaelbling

  • Affiliations:
  • Carnegie Mellon University;Carnegie Mellon University;Brown University

  • Venue:
  • IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

If you have planned to achieve one particular goal in a stochastic delayed rewards problem and then someone asks about a different goal what should you do? What if you need to be ready to quickly supply an answer for any possible goal? This paper shows that by using a new kind of automata caily generated abstract action hierarchy that with N states, preparing for all of N possible goals can be much much cheaper than N times the work of preparing for one goal. In goal-based Markov Decision Problems, it is usual to generate a policy π(x) mapping states to actions, and a value function J(x) mapping states to an estimate of minimum expected cost-to-goal, starting at x. In this paper we will use the terminology that a multipolicy π*(x, y) (for all state-pairs (x, y)) maps a state x to the first action it should take in order to reach y with expected minimum cost and a multi-valuefunction J*(x, y) is a definition of this minimum cost. Building these objects quickly and with little memory is the main purpose of this paper, but a secondary result is a natural, automatic, way to create a set of parsimonious yet powerful abstractactions for MDPs. The paper concludes with a set of empirical results on increasingly large MDPs.