Speeding-up Reinforcement Learning with Multi-step Actions
ICANN '02 Proceedings of the International Conference on Artificial Neural Networks
Transfer of Experience Between Reinforcement Learning Environments with Progressive Difficulty
Artificial Intelligence Review
Transfer of task representation in reinforcement learning using policy-based proto-value functions
Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 3
Experiments with Adaptive Transfer Rate in Reinforcement Learning
Knowledge Acquisition: Approaches, Algorithms and Applications
Building portable options: skill transfer in reinforcement learning
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Transfer Learning for Reinforcement Learning Domains: A Survey
The Journal of Machine Learning Research
Learning relational options for inductive transfer in relational reinforcement learning
ILP'07 Proceedings of the 17th international conference on Inductive logic programming
Relational macros for transfer in reinforcement learning
ILP'07 Proceedings of the 17th international conference on Inductive logic programming
Transfer in reinforcement learning via shared features
The Journal of Machine Learning Research
Hi-index | 0.00 |
One of the original motivations for the use of temporally extended actions, or options, in reinforcement learning was to enable the transfer of learned value functions or policies to new problems. Many experimenters have used options to speed learning on single problems, but options have not been studied in depth as a tool for transfer. In this paper we introduce a formal model of a learning problem as a distribution of Markov Decision Problems (MDPs). Each MDP represents a task the agent will have to solve. Our model can also be viewed as a partially observable Markov decision problem (POMDP), with a special structure that we describe. We study two learning algorithms, one which keeps a single value function that generalizes across tasks, and an incremental POMDP-inspired method maintaining separate value functions for each task. We evaluate the learning algorithms on an extension of the Mountain Car domain, in terms of both learning speed and asymptotic performance. Empirically, we find that temporally extended options can facilitate transfer for both algorithms. In our domain, the single value function algorithm has much better learning speed because it generalizes its experience more broadly across tasks. We also observe that different sets of options can achieve tradeoffs of learning speed versus asymptotic performance.