Transfer in variable-reward hierarchical reinforcement learning

Authors:
Neville Mehta;Sriraam Natarajan;Prasad Tadepalli;Alan Fern
Affiliations:
School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, USA 97330;School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, USA 97330;School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, USA 97330;School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, USA 97330
Venue:
Machine Learning
Year:
2008

Citing 16
Cited 5

Constrained Markov decision models with weighted discounted rewards

Mathematics of Operations Research
Model-based average reward reinforcement learning

Artificial Intelligence
Planning and acting in partially observable stochastic domains

Artificial Intelligence
Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning

Artificial Intelligence
Model-based Hierarchical Average-reward Reinforcement Learning

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Multi-criteria Reinforcement Learning

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
State abstraction for programmable reinforcement learning agents

Eighteenth national conference on Artificial intelligence
Apprenticeship learning via inverse reinforcement learning

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Dynamic preferences in multi-criteria reinforcement learning

ICML '05 Proceedings of the 22nd international conference on Machine learning
Value-function-based transfer for reinforcement learning using structure mapping

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Value functions for RL-based behavior transfer: a comparative study

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Hierarchical reinforcement learning with the MAXQ value function decomposition

Journal of Artificial Intelligence Research
Accelerating reinforcement learning through implicit imitation

Journal of Artificial Intelligence Research
Generalizing plans to new environments in relational MDPs

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Relational macros for transfer in reinforcement learning

ILP'07 Proceedings of the 17th international conference on Inductive logic programming
Flexible decomposition algorithms for weakly coupled Markov decision problems

UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence

Guest editor's introduction: special issue on inductive transfer learning

Machine Learning
Transfer Learning for Reinforcement Learning Domains: A Survey

The Journal of Machine Learning Research
S2A: secure smart household appliances

Proceedings of the second ACM conference on Data and Application Security and Privacy
Construction of approximation spaces for reinforcement learning

The Journal of Machine Learning Research
Learning potential functions and their representations for multi-task reinforcement learning

Autonomous Agents and Multi-Agent Systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

Transfer learning seeks to leverage previously learned tasks to achieve faster learning in a new task. In this paper, we consider transfer learning in the context of related but distinct Reinforcement Learning (RL) problems. In particular, our RL problems are derived from Semi-Markov Decision Processes (SMDPs) that share the same transition dynamics but have different reward functions that are linear in a set of reward features. We formally define the transfer learning problem in the context of RL as learning an efficient algorithm to solve any SMDP drawn from a fixed distribution after experiencing a finite number of them. Furthermore, we introduce an online algorithm to solve this problem, Variable-Reward Reinforcement Learning (VRRL), that compactly stores the optimal value functions for several SMDPs, and uses them to optimally initialize the value function for a new SMDP. We generalize our method to a hierarchical RL setting where the different SMDPs share the same task hierarchy. Our experimental results in a simplified real-time strategy domain show that significant transfer learning occurs in both flat and hierarchical settings. Transfer is especially effective in the hierarchical setting where the overall value functions are decomposed into subtask value functions which are more widely amenable to transfer across different SMDPs.