Unified inter and intra options learning using policy gradient methods

Authors:
Kfir Y. Levy;Nahum Shimkin
Affiliations:
Faculty of Electrical Engineering, Technion, Haifa, Israel;Faculty of Electrical Engineering, Technion, Haifa, Israel
Venue:
EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Year:
2011

Citing 10
Cited 0

Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning

Artificial Intelligence
Technical Update: Least-Squares Temporal Difference Learning

Machine Learning
Least Squares Policy Evaluation Algorithms with Linear Function Approximation

Discrete Event Dynamic Systems
Q-Cut - Dynamic Discovery of Sub-goals in Reinforcement Learning

ECML '02 Proceedings of the 13th European Conference on Machine Learning
Shaping multi-agent systems with gradient reinforcement learning

Autonomous Agents and Multi-Agent Systems
Natural Actor-Critic

Neurocomputing
Learning complex motions by sequencing simpler motion templates

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Covariant policy search

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Natural actor-critic algorithms

Automatica (Journal of IFAC)
Optimal policy switching algorithms for reinforcement learning

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

Temporally extended actions (or macro-actions) have proven useful for speeding up planning and learning, adding robustness, and building prior knowledge into AI systems. The options framework, as introduced in Sutton, Precup and Singh (1999), provides a natural way to incorporate macro-actions into reinforcement learning. In the subgoals approach, learning is divided into two phases, first learning each option with a prescribed subgoal, and then learning to compose the learned options together. In this paper we offer a unified framework for concurrent inter- and intra-options learning. To that end, we propose a modular parameterization of intra-option policies together with option termination conditions and the option selection policy (inter options), and show that these three decision components may be viewed as a unified policy over an augmented state-action space, to which standard policy gradient algorithms may be applied. We identify the basis functions that apply to each of these decision components, and show that they possess a useful orthogonality property that allows to compute the natural gradient independently for each component. We further outline the extension of the suggested framework to several levels of options hierarchy, and conclude with a brief illustrative example.