Automatic programming of behavior-based robots using reinforcement learning
Artificial Intelligence
TD-Gammon, a self-teaching backgammon program, achieves master-level play
Neural Computation
Average reward reinforcement learning: foundations, algorithms, and empirical results
Machine Learning - Special issue on reinforcement learning
Model-based average reward reinforcement learning
Artificial Intelligence
Elevator Group Control Using Multiple Reinforcement Learning Agents
Machine Learning
Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning
Artificial Intelligence
Learning Algorithms for Markov Decision Processes with Average Cost
SIAM Journal on Control and Optimization
Recent Advances in Hierarchical Reinforcement Learning
Discrete Event Dynamic Systems
Hierarchically Optimal Average Reward Reinforcement Learning
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Model-based Hierarchical Average-reward Reinforcement Learning
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Hierarchical Optimization of Policy-Coupled Semi-Markov Decision Processes
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Continuous-Time Hierarchical Reinforcement Learning
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Advances in Neural Information Processing Systems 5, [NIPS Conference]
State abstraction for programmable reinforcement learning agents
Eighteenth national conference on Artificial intelligence
Dynamic Programming
Hierarchical control and learning for markov decision processes
Hierarchical control and learning for markov decision processes
Temporal abstraction in reinforcement learning
Temporal abstraction in reinforcement learning
Programmable reinforcement learning agents
Programmable reinforcement learning agents
Hierarchical reinforcement learning with the MAXQ value function decomposition
Journal of Artificial Intelligence Research
A reinforcement learning approach to job-shop scheduling
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Auto-exploratory average reward reinforcement learning
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
Learning Representation and Control in Markov Decision Processes: New Frontiers
Foundations and Trends® in Machine Learning
Basis function construction for hierarchical reinforcement learning
Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Reinforcement learning algorithms with function approximation: Recent advances and applications
Information Sciences: an International Journal
Hi-index | 0.00 |
Hierarchical reinforcement learning (HRL) is a general framework for scaling reinforcement learning (RL) to problems with large state and action spaces by using the task (or action) structure to restrict the space of policies. Prior work in HRL including HAMs, options, MAXQ, and PHAMs has been limited to the discrete-time discounted reward semi-Markov decision process (SMDP) model. The average reward optimality criterion has been recognized to be more appropriate for a wide class of continuing tasks than the discounted framework. Although average reward RL has been studied for decades, prior work has been largely limited to flat policy representations. In this paper, we develop a framework for HRL based on the average reward optimality criterion. We investigate two formulations of HRL based on the average reward SMDP model, both for discrete-time and continuous-time. These formulations correspond to two notions of optimality that have been previously explored in HRL: hierarchical optimality and recursive optimality. We present algorithms that learn to find hierarchically and recursively optimal average reward policies under discrete-time and continuous-time average reward SMDP models. We use two automated guided vehicle (AGV) scheduling tasks as experimental testbeds to study the empirical performance of the proposed algorithms. The first problem is a relatively simple AGV scheduling task, in which the hierarchically and recursively optimal policies are different. We compare the proposed algorithms with three other HRL methods, including a hierarchically optimal discounted reward algorithm and a recursively optimal discounted reward algorithm on this problem. The second problem is a larger AGV scheduling task. We model this problem using both discrete-time and continuous-time models. We use a hierarchical task decomposition in which the hierarchically and recursively optimal policies are the same for this problem. We compare the performance of the proposed algorithms with a hierarchically optimal discounted reward algorithm and a recursively optimal discounted reward algorithm, as well as a non-hierarchical average reward algorithm. The results show that the proposed hierarchical average reward algorithms converge to the same performance as their discounted reward counterparts.