Generating hierarchical structure in reinforcement learning from state variables

  • Authors:
  • Bernhard Hengst

  • Affiliations:
  • School of Computer Science and Engineering, University of New South Wales, Sydney, Australia

  • Venue:
  • PRICAI'00 Proceedings of the 6th Pacific Rim international conference on Artificial intelligence
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents the CQ algorithm which decomposes and solves a Markov Decision Process (MDP) by automatically generating a hierarchy of smaller MDPs using state variables. The CQ algorithm uses a heuristic which is applicable for problems that can be modelled by a set of state variables that conform to a special ordering, defined in this paper as a "nested Markov ordering". The benefits of this approach are: (1) the automatic generation of actions and termination conditions at all levels in the hierarchy, and (2) linear scaling with the number of variables under certain conditions. This approach draws heavily on Dietterich's MAXQ value function decomposition and Hauskrecht, Meuleau, Kaelbling, Dean, Boutilier's and others region based decomposition of MDPs. The CQ algorithm is described and its functionality illustrated using a four room example. Different solutions are generated with different numbers of hierarchical levels to solve Dietterich's taxi tasks.