Proceedings of the seventh international conference (1990) on Machine learning
Linear least-squares algorithms for temporal difference learning
Machine Learning - Special issue on reinforcement learning
Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning
Artificial Intelligence
Technical Update: Least-Squares Temporal Difference Learning
Machine Learning
An intrinsic reward mechanism for efficient exploration
ICML '06 Proceedings of the 23rd international conference on Machine learning
Proceedings of the 25th international conference on Machine learning
Using Homomorphisms to transfer options across continuous reinforcement learning domains
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Hierarchical reinforcement learning with the MAXQ value function decomposition
Journal of Artificial Intelligence Research
Building portable options: skill transfer in reinforcement learning
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Efficient skill learning using abstraction selection
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Learning relational options for inductive transfer in relational reinforcement learning
ILP'07 Proceedings of the 17th international conference on Inductive logic programming
Hi-index | 0.00 |
Learning, planning, and representing knowledge in large state spaces at multiple levels of temporal abstraction are key, long-standing challenges for building flexible autonomous agents. The options framework provides a formal mechanism for specifying and learning temporally-extended skills. Although past work has demonstrated the benefit of acting according to options in continuous state spaces, one of the central advantages of temporal abstraction---the ability to plan using a temporally abstract model---remains a challenging problem when the number of environment states is large or infinite. In this work, we develop a knowledge construct, the linear option, which is capable of modeling temporally abstract dynamics in continuous state spaces. We show that planning with a linear expectation model of an option's dynamics converges to a fixed point with low Temporal Difference (TD) error. Next, building on recent work on linear feature selection, we show conditions under which a linear feature set is sufficient for accurately representing the value function of an option policy. We extend this result to show conditions under which multiple options may be repeatedly composed to create new options with accurate linear models. Finally, we demonstrate linear option learning and planning algorithms in a simulated robot environment.