Generating hierarchical structure in reinforcement learning from state variables

Authors:
Bernhard Hengst
Affiliations:
School of Computer Science and Engineering, University of New South Wales, Sydney, Australia
Venue:
PRICAI'00 Proceedings of the 6th Pacific Rim international conference on Artificial intelligence
Year:
2000

Citing 6
Cited 3

Reinforcement learning with hierarchies of machines

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Improved switching among temporally abstract actions

Proceedings of the 1998 conference on Advances in neural information processing systems II
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Decomposition Techniques for Planning in Stochastic Domains

Decomposition Techniques for Planning in Stochastic Domains
Between MOPs and Semi-MOP: Learning, Planning & Representing Knowledge at Multiple Temporal Scales

Between MOPs and Semi-MOP: Learning, Planning & Representing Knowledge at Multiple Temporal Scales
Hierarchical control and learning for markov decision processes

Hierarchical control and learning for markov decision processes

TTree: Tree-Based State Generalization with Temporally Abstract Actions

Proceedings of the 5th International Symposium on Abstraction, Reformulation and Approximation
Reinforcement learning for problems with symmetrical restricted states

Robotics and Autonomous Systems
Self task decomposition for modular learning system through interpretation of instruction by coach

RoboCup 2005

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents the CQ algorithm which decomposes and solves a Markov Decision Process (MDP) by automatically generating a hierarchy of smaller MDPs using state variables. The CQ algorithm uses a heuristic which is applicable for problems that can be modelled by a set of state variables that conform to a special ordering, defined in this paper as a "nested Markov ordering". The benefits of this approach are: (1) the automatic generation of actions and termination conditions at all levels in the hierarchy, and (2) linear scaling with the number of variables under certain conditions. This approach draws heavily on Dietterich's MAXQ value function decomposition and Hauskrecht, Meuleau, Kaelbling, Dean, Boutilier's and others region based decomposition of MDPs. The CQ algorithm is described and its functionality illustrated using a four room example. Different solutions are generated with different numbers of hierarchical levels to solve Dietterich's taxi tasks.