Reinforcement learning with hierarchies of machines
NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning
Artificial Intelligence
Recent Advances in Hierarchical Reinforcement Learning
Discrete Event Dynamic Systems
The MAXQ Method for Hierarchical Reinforcement Learning
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Proto-value functions: developmental reinforcement learning
ICML '05 Proceedings of the 22nd international conference on Machine learning
Automatic basis function construction for approximate dynamic programming and reinforcement learning
ICML '06 Proceedings of the 23rd international conference on Machine learning
Learning state-action basis functions for hierarchical MDPs
Proceedings of the 24th international conference on Machine learning
Analyzing feature generation for value-function approximation
Proceedings of the 24th international conference on Machine learning
Hierarchical Average Reward Reinforcement Learning
The Journal of Machine Learning Research
Hierarchical reinforcement learning with the MAXQ value function decomposition
Journal of Artificial Intelligence Research
An analysis of Laplacian methods for value function approximation in MDPs
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Hi-index | 0.00 |
Much past work on solving Markov decision processes (MDPs) using reinforcement learning (RL) has relied on combining parameter estimation methods with hand-designed function approximation architectures for representing value functions. Recently, there has been growing interest in a broader framework that combines representation discovery and control learning, where value functions are approximated using a linear combination of task-dependent basis functions learned during the course of solving a particular MDP. This paper introduces an approach to automatic basis function construction for hierarchical reinforcement learning (HRL). Our approach generalizes past work on basis construction to multi-level action hierarchies by forming a compressed representation of a semi-Markov decision process (SMDP) at multiple levels of temporal abstraction. The specific approach is based on hierarchical spectral analysis of graphs induced on an SMDP's state space from sample trajectories. We present experimental results on benchmark SMDPs, showing significant speedups when compared to hand-designed approximation architectures.