Using spatial hints to improve policy reuse in a reinforcement learning agent

Authors:
Bruno N. da Silva;Alan Mackworth
Affiliations:
University of British Columbia, Vancouver, BC, Canada;University of British Columbia, Vancouver, BC, Canada
Venue:
Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Year:
2010

Citing 10
Cited 1

Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning

Artificial Intelligence
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Transfer of Experience Between Reinforcement Learning Environments with Progressive Difficulty

Artificial Intelligence Review
Probabilistic policy reuse in a reinforcement learning agent

AAMAS '06 Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
Transfer Learning via Inter-Task Mappings for Temporal Difference Learning

The Journal of Machine Learning Research
Reinforcement learning with human teachers: evidence of feedback and guidance with implications for learning performance

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Giving advice about preferred actions to reinforcement learners via knowledge-based kernel regression

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Improving action selection in MDP's via knowledge transfer

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Reinforcement learning: a survey

Journal of Artificial Intelligence Research
Transfer Learning for Reinforcement Learning Domains: A Survey

The Journal of Machine Learning Research

Adaptive multi-robot team reconfiguration using a policy-reuse reinforcement learning approach

AAMAS'11 Proceedings of the 10th international conference on Advanced Agent Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study the problem of knowledge reuse by a reinforcement learning agent. We are interested in how an agent can exploit policies that were learned in the past to learn a new task more efficiently in the present. Our approach is to elicit spatial hints from an expert suggesting the world states in which each existing policy should be more relevant to the new task. By using these hints with domain exploration, the agent is able to detect those portions of existing policies that are beneficial to the new task, therefore learning a new policy more efficiently. We call our approach Spatial Hints Policy Reuse (SHPR). Experiments demonstrate the effectiveness and robustness of our method. Our results encourage further study investigating how much more efficacy can be gained from the elicitation of very simple advice from humans.