Reinforcement learning with hierarchies of machines
NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning
Artificial Intelligence
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Q-Cut - Dynamic Discovery of Sub-goals in Reinforcement Learning
ECML '02 Proceedings of the 13th European Conference on Machine Learning
Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Discovering Hierarchy in Reinforcement Learning with HEXQ
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Temporal abstraction in reinforcement learning
Temporal abstraction in reinforcement learning
Recent Advances in Hierarchical Reinforcement Learning
Discrete Event Dynamic Systems
Identifying useful subgoals in reinforcement learning by local graph partitioning
ICML '05 Proceedings of the 22nd international conference on Machine learning
Causal Graph Based Decomposition of Factored MDPs
The Journal of Machine Learning Research
Automatic discovery and transfer of MAXQ hierarchies
Proceedings of the 25th international conference on Machine learning
Hierarchical reinforcement learning with the MAXQ value function decomposition
Journal of Artificial Intelligence Research
Automatic construction of temporally extended actions for MDPs using bisimulation metrics
EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Unified inter and intra options learning using policy gradient methods
EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
A sociologically inspired heuristic for optimization algorithms: A case study on ant systems
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
We address the problem of single-agent, autonomous sequential decision making. We assume that some controllers or behavior policies are given as prior knowledge, and the task of the agent is to learn how to switch between these policies. We formulate the problem using the framework of reinforcement learning and options (Sutton, Precup & Singh, 1999; Precup, 2000). We derive gradient-based algorithms for learning the termination conditions of options, with the goal of optimizing the expected long-term return. We incorporate the proposed approach into policy-gradient methods with linear function approximation.