Using Options for Knowledge Transfer in Reinforcement Learning TITLE2:

Authors:
T. J. Perkins;D. Precup
Affiliations:
-;-
Venue:
Using Options for Knowledge Transfer in Reinforcement Learning TITLE2:
Year:
1999

Citing 0
Cited 9

Speeding-up Reinforcement Learning with Multi-step Actions

ICANN '02 Proceedings of the International Conference on Artificial Neural Networks
Transfer of Experience Between Reinforcement Learning Environments with Progressive Difficulty

Artificial Intelligence Review
Transfer of task representation in reinforcement learning using policy-based proto-value functions

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 3
Experiments with Adaptive Transfer Rate in Reinforcement Learning

Knowledge Acquisition: Approaches, Algorithms and Applications
Building portable options: skill transfer in reinforcement learning

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Transfer Learning for Reinforcement Learning Domains: A Survey

The Journal of Machine Learning Research
Learning relational options for inductive transfer in relational reinforcement learning

ILP'07 Proceedings of the 17th international conference on Inductive logic programming
Relational macros for transfer in reinforcement learning

ILP'07 Proceedings of the 17th international conference on Inductive logic programming
Transfer in reinforcement learning via shared features

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

One of the original motivations for the use of temporally extended actions, or options, in reinforcement learning was to enable the transfer of learned value functions or policies to new problems. Many experimenters have used options to speed learning on single problems, but options have not been studied in depth as a tool for transfer. In this paper we introduce a formal model of a learning problem as a distribution of Markov Decision Problems (MDPs). Each MDP represents a task the agent will have to solve. Our model can also be viewed as a partially observable Markov decision problem (POMDP), with a special structure that we describe. We study two learning algorithms, one which keeps a single value function that generalizes across tasks, and an incremental POMDP-inspired method maintaining separate value functions for each task. We evaluate the learning algorithms on an extension of the Mountain Car domain, in terms of both learning speed and asymptotic performance. Empirically, we find that temporally extended options can facilitate transfer for both algorithms. In our domain, the single value function algorithm has much better learning speed because it generalizes its experience more broadly across tasks. We also observe that different sets of options can achieve tradeoffs of learning speed versus asymptotic performance.