Reinforcement learning with via-point representation
Neural Networks
Model-free learning control of neutralization processes using reinforcement learning
Engineering Applications of Artificial Intelligence
Synthesis of strategies from interaction traces
Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 2
A reinforcement learning model using macro-actions in multi-task grid-world problems
SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
The cognitive and behavioral mediation of institutions: Towards an account of institutional actions
Cognitive Systems Research
Hi-index | 0.00 |
Several researchers have proposed reinforcement learning methods that obtain advantages in learning by using temporally extended actions, or macro-actions, but none has carefully analyzed what these advantages are. In this paper, we separate and analyze two advantages of using macro-actions in reinforcement learning: the effect on exploratory behavior, independent of learning, and the effect on the speed with which the learning process propagates accurate value information. We empirically measure the separate contributions of these two effects in gridworld and simulated robotic environments. In these environments, both effects were significant, but the effect of value propagation was larger. We also compare the accelerations of value propagation due to macro-actions and eligibility traces in the gridworld environment. Although eligibility traces increased the rate of convergence to the optimal value function compared to learning with macro-actions but without eligibility traces, eligibility traces did not permit the optimal policy to be learned as quickly as it was using macro-actions.