Economic hierarchical Q-learning

Authors:
Erik G. Schultink;Ruggiero Cavallo;David C. Parkes
Affiliations:
School of Engineering and Applied Sciences, Harvard University, Cambridge, MA;School of Engineering and Applied Sciences, Harvard University, Cambridge, MA;School of Engineering and Applied Sciences, Harvard University, Cambridge, MA
Venue:
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Year:
2008

Citing 4
Cited 0

Reinforcement learning with hierarchies of machines

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
State abstraction for programmable reinforcement learning agents

Eighteenth national conference on Artificial intelligence
Hierarchical reinforcement learning with the MAXQ value function decomposition

Journal of Artificial Intelligence Research
Decomposition techniques for planning in stochastic domains

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

Hierarchical state decompositions address the curse-of-dimensionality in Q-learning methods for reinforcement learning (RL) but can suffer from suboptimality. In addressing this, we introduce the Economic Hierarchical Q-Learning (EHQ) algorithm for hierarchical RL. The EHQ algorithm uses subsidies to align interests such that agents that would otherwise converge to a recursively optimal policy will instead be motivated to act hierarchically optimally. The essential idea is that a parent will pay a child for the relative value to the rest of the system for "returning the world" in one state over another state. The resulting learning framework is simple compared to other algorithms that obtain hierarchical optimality. Additionally, EHQ encapsulates relevant information about value tradeoffs faced across the hierarchy at each node and requires minimal data exchange between nodes. We provide no theoretical proof of hierarchical optimality but are able demonstrate success with EHQ in empirical results.