Reinforcement learning with a hierarchy of abstract models

Authors:
Satinder P. Singh
Affiliations:
Department of Computer Science, University of Massachusetts, Amherst, MA
Venue:
AAAI'92 Proceedings of the tenth national conference on Artificial intelligence
Year:
1992

Citing 13
Cited 6

Learning to solve problems by searching for macro-operators

Learning to solve problems by searching for macro-operators
Parallel and distributed computation: numerical methods

Parallel and distributed computation: numerical methods
Integrated architecture for learning, planning, and reacting based on approximating dynamic programming

Proceedings of the seventh international conference (1990) on Machine learning
Practical Issues in Temporal Difference Learning

Machine Learning
Transfer of Learning by Composing Solutions of Elemental Sequential Tasks

Machine Learning
Scaling reinforcement learning algorithms by learning variable temporal resolution models

ML92 Proceedings of the ninth international workshop on Machine learning
Introduction to Stochastic Dynamic Programming: Probability and Mathematical

Introduction to Stochastic Dynamic Programming: Probability and Mathematical
Learning to Predict by the Methods of Temporal Differences

Machine Learning
A Heuristic Approach to the Discovery of Macro-Operators

Machine Learning
Learning in embedded systems

Learning in embedded systems
Planning in a hierarchy of abstraction spaces

IJCAI'73 Proceedings of the 3rd international joint conference on Artificial intelligence
Universal plans for reactive robots in unpredictable environments

IJCAI'87 Proceedings of the 10th international joint conference on Artificial intelligence - Volume 2
Input generalization in delayed reinforcement learning: an algorithm and performance comparisons

IJCAI'91 Proceedings of the 12th international joint conference on Artificial intelligence - Volume 2

A state-cluster based Q-learning

ICNC'09 Proceedings of the 5th international conference on Natural computation
Simultaneous learning to acquire competitive behaviors in multi-agent system based on modular learning system

RoboCup 2005
Finding hidden hierarchy in reinforcement learning

KES'05 Proceedings of the 9th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part III
Modular learning system and scheduling for behavior acquisition in multi-agent environment

RoboCup 2004
Neuro-fuzzy-based skill learning for robots

Robotica
Multi-timescale nexting in a reinforcement learning robot

Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Reinforcement learning (RL) algorithms have traditionally been thought of as trial and error learning methods that use actual control experience to incrementally improve a control policy. Sutton's DYNA architecture demonstrated that RL algorithms can work as well using simulated experience from an environment model, and that the resulting computation was similar to doing one-step lookahead planning. Inspired by the literature on hierarchical planning, I propose learning a hierarchy of models of the environment that abstract temporal detail as a means of improving the scalability of RL algorithms. I present H-DYNA (Hierarchical DYNA), an extension to Sutton's DYNA architecture that is able to learn such a hierarchy of abstract models. H-DYNA differs from hierarchical planners in two ways: first, the abstract models are learned using experience gained while learning to solve other tasks in the same environment, and second, the abstract models can be used to solve stochastic control tasks. Simulations on a set of compositionally-structured navigation tasks show that H-DYNA can learn to solve them faster than conventional RL algorithms. The abstract models also serve as mechanisms for achieving transfer of learning across multiple tasks.