Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning
Artificial Intelligence
Learning to Predict by the Methods of Temporal Differences
Machine Learning
Hierarchical reinforcement learning with the MAXQ value function decomposition
Journal of Artificial Intelligence Research
Hi-index | 0.00 |
Inspired by recent biological evidence we propose an emotion-based hierarchical reinforcement learning (HRL) algorithm to improve the adaptivity of HRL and accelerate the learning process in environments with multiple sources of reward. In the algorithm each reward source defines a subtask and each subtask is assigned an artificial emotion indication (AEI) which predicts the reward component associated with the subtask. The AEIs can be simultaneously learned along with the top-level policy and used to interrupt subtask execution when the AEIs change significantly. The algorithm is tested in a simulated gridworld which has two sources of reward and is partially observable. The experimental results show that the inclusion of an emotion mechanism makes reuse of the subtask policies efficient. Use of the artificial emotion variables significantly accelerates the learning process and achieves higher long term reward compared to a human designed policy and a restricted fomm of MAXQ algorithm.