Hierarchical Average Reward Reinforcement Learning

Authors:
Mohammad Ghavamzadeh;Sridhar Mahadevan
Affiliations:
-;-
Venue:
The Journal of Machine Learning Research
Year:
2007

Citing 23
Cited 3

Automatic programming of behavior-based robots using reinforcement learning

Artificial Intelligence
Transfer of Learning by Composing Solutions of Elemental Sequential Tasks

Machine Learning
TD-Gammon, a self-teaching backgammon program, achieves master-level play

Neural Computation
Average reward reinforcement learning: foundations, algorithms, and empirical results

Machine Learning - Special issue on reinforcement learning
Model-based average reward reinforcement learning

Artificial Intelligence
Elevator Group Control Using Multiple Reinforcement Learning Agents

Machine Learning
Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning

Artificial Intelligence
Learning Algorithms for Markov Decision Processes with Average Cost

SIAM Journal on Control and Optimization
Recent Advances in Hierarchical Reinforcement Learning

Discrete Event Dynamic Systems
Hierarchically Optimal Average Reward Reinforcement Learning

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Model-based Hierarchical Average-reward Reinforcement Learning

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Hierarchical Optimization of Policy-Coupled Semi-Markov Decision Processes

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Continuous-Time Hierarchical Reinforcement Learning

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Feudal Reinforcement Learning

Advances in Neural Information Processing Systems 5, [NIPS Conference]
State abstraction for programmable reinforcement learning agents

Eighteenth national conference on Artificial intelligence
Dynamic Programming

Dynamic Programming
Hierarchical control and learning for markov decision processes

Hierarchical control and learning for markov decision processes
Temporal abstraction in reinforcement learning

Temporal abstraction in reinforcement learning
Programmable reinforcement learning agents

Programmable reinforcement learning agents
Hierarchical reinforcement learning with the MAXQ value function decomposition

Journal of Artificial Intelligence Research
A reinforcement learning approach to job-shop scheduling

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Auto-exploratory average reward reinforcement learning

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1

Learning Representation and Control in Markov Decision Processes: New Frontiers

Foundations and Trends® in Machine Learning
Basis function construction for hierarchical reinforcement learning

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Reinforcement learning algorithms with function approximation: Recent advances and applications

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Hierarchical reinforcement learning (HRL) is a general framework for scaling reinforcement learning (RL) to problems with large state and action spaces by using the task (or action) structure to restrict the space of policies. Prior work in HRL including HAMs, options, MAXQ, and PHAMs has been limited to the discrete-time discounted reward semi-Markov decision process (SMDP) model. The average reward optimality criterion has been recognized to be more appropriate for a wide class of continuing tasks than the discounted framework. Although average reward RL has been studied for decades, prior work has been largely limited to flat policy representations. In this paper, we develop a framework for HRL based on the average reward optimality criterion. We investigate two formulations of HRL based on the average reward SMDP model, both for discrete-time and continuous-time. These formulations correspond to two notions of optimality that have been previously explored in HRL: hierarchical optimality and recursive optimality. We present algorithms that learn to find hierarchically and recursively optimal average reward policies under discrete-time and continuous-time average reward SMDP models. We use two automated guided vehicle (AGV) scheduling tasks as experimental testbeds to study the empirical performance of the proposed algorithms. The first problem is a relatively simple AGV scheduling task, in which the hierarchically and recursively optimal policies are different. We compare the proposed algorithms with three other HRL methods, including a hierarchically optimal discounted reward algorithm and a recursively optimal discounted reward algorithm on this problem. The second problem is a larger AGV scheduling task. We model this problem using both discrete-time and continuous-time models. We use a hierarchical task decomposition in which the hierarchically and recursively optimal policies are the same for this problem. We compare the performance of the proposed algorithms with a hierarchically optimal discounted reward algorithm and a recursively optimal discounted reward algorithm, as well as a non-hierarchical average reward algorithm. The results show that the proposed hierarchical average reward algorithms converge to the same performance as their discounted reward counterparts.