Adaptive probabilistic policy reuse

Authors:
Yann Chevaleyre;Aydano Machado Pamponet
Affiliations:
Université Paris 13, France;Universidade Federal de Alagoas, Brazil
Venue:
ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part III
Year:
2012

Citing 5
Cited 0

PAC model-free reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Probabilistic policy reuse in a reinforcement learning agent

AAMAS '06 Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
Combining expert advice in reactive environments

Journal of the ACM (JACM)
Transfer of samples in batch reinforcement learning

Proceedings of the 25th international conference on Machine learning
Accelerating reinforcement learning through implicit imitation

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Transfer algorithms allow the use of knowledge previously learned on related tasks to speed-up learning of the current task. Recently, many complex reinforcement learning problems have been successfully solved by efficient transfer learners. However, most of these algorithms suffer from a severe flaw: they are implicitly tuned to transfer knowledge between tasks having a given degree of similarity. In other words, if the previous task is very dissimilar (respectively nearly identical) to the current task, then the transfer process might slow down the learning (respectively might be far from optimal speed up). In this paper, we address this specific issue by explicitly optimizing the transfer rate between tasks and answer to the question: "can the transfer rate be accurately optimized, and at what cost?". In this paper, we show that this optimization problem is related to the continuum bandit problem. Based on this relation, we design an generic adaptive transfer method, which we evaluate on a grid-world task.