Multiscale Anticipatory Behavior by Hierarchical Reinforcement Learning

Authors:
Matthias Rungger;Hao Ding;Olaf Stursberg
Affiliations:
Institute of Automatic Control Engineering, Technische Universität München, Munich, Germany D-80290;Institute of Automatic Control Engineering, Technische Universität München, Munich, Germany D-80290;Institute of Automatic Control Engineering, Technische Universität München, Munich, Germany D-80290
Venue:
Anticipatory Behavior in Adaptive Learning Systems
Year:
2009

Citing 10
Cited 0

Reinforcement learning with hierarchies of machines

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning

Artificial Intelligence
An Behavior-based Robotics

An Behavior-based Robotics
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Behavior Based Robotics Using Hybrid Automata

HSCC '00 Proceedings of the Third International Workshop on Hybrid Systems: Computation and Control
The theory of hybrid automata

LICS '96 Proceedings of the 11th Annual IEEE Symposium on Logic in Computer Science
Reinforcement Learning in Continuous Time and Space

Neural Computation
Brief paper: Multi-modal control using adaptive motion description languages

Automatica (Journal of IFAC)
Hierarchical reinforcement learning with the MAXQ value function decomposition

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

In order to establish autonomous behavior for technical systems, the well known trade-off between reactive control and deliberative planning has to be considered. Within this paper, we combine both principles by proposing a two-level hierarchical reinforcement learning scheme to enable the system to autonomously determine suitable solutions to new tasks. The approach is based on a behavior representation specified by hybrid automata, which combines continuous and discrete behavior, to predict (anticipate ) the outcome of a sequence of actions. On the higher layer of the hierarchical scheme, the behavior is abstracted in the form of finite state automata, on which value function iteration is performed to obtain a goal leading sequence of subtasks. This sequence is realized on the lower layer by applying policy gradient-based reinforcement learning to the hybrid automaton model. The iteration between both layers leads to a consistent and goal-attaining behavior, as shown for a simple robot grasping task.