Learning to solve problems by searching for macro-operators
Learning to solve problems by searching for macro-operators
Parallel and distributed computation: numerical methods
Parallel and distributed computation: numerical methods
Proceedings of the seventh international conference (1990) on Machine learning
Practical Issues in Temporal Difference Learning
Machine Learning
Scaling reinforcement learning algorithms by learning variable temporal resolution models
ML92 Proceedings of the ninth international workshop on Machine learning
Introduction to Stochastic Dynamic Programming: Probability and Mathematical
Introduction to Stochastic Dynamic Programming: Probability and Mathematical
Learning to Predict by the Methods of Temporal Differences
Machine Learning
A Heuristic Approach to the Discovery of Macro-Operators
Machine Learning
Learning in embedded systems
Planning in a hierarchy of abstraction spaces
IJCAI'73 Proceedings of the 3rd international joint conference on Artificial intelligence
Universal plans for reactive robots in unpredictable environments
IJCAI'87 Proceedings of the 10th international joint conference on Artificial intelligence - Volume 2
Input generalization in delayed reinforcement learning: an algorithm and performance comparisons
IJCAI'91 Proceedings of the 12th international joint conference on Artificial intelligence - Volume 2
A state-cluster based Q-learning
ICNC'09 Proceedings of the 5th international conference on Natural computation
Finding hidden hierarchy in reinforcement learning
KES'05 Proceedings of the 9th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part III
Multi-timescale nexting in a reinforcement learning robot
Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems
Hi-index | 0.00 |
Reinforcement learning (RL) algorithms have traditionally been thought of as trial and error learning methods that use actual control experience to incrementally improve a control policy. Sutton's DYNA architecture demonstrated that RL algorithms can work as well using simulated experience from an environment model, and that the resulting computation was similar to doing one-step lookahead planning. Inspired by the literature on hierarchical planning, I propose learning a hierarchy of models of the environment that abstract temporal detail as a means of improving the scalability of RL algorithms. I present H-DYNA (Hierarchical DYNA), an extension to Sutton's DYNA architecture that is able to learn such a hierarchy of abstract models. H-DYNA differs from hierarchical planners in two ways: first, the abstract models are learned using experience gained while learning to solve other tasks in the same environment, and second, the abstract models can be used to solve stochastic control tasks. Simulations on a set of compositionally-structured navigation tasks show that H-DYNA can learn to solve them faster than conventional RL algorithms. The abstract models also serve as mechanisms for achieving transfer of learning across multiple tasks.