Markov Decision Processes: Discrete Stochastic Dynamic Programming
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Probabilistic policy reuse in a reinforcement learning agent
AAMAS '06 Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
Transfer Learning for Reinforcement Learning Domains: A Survey
The Journal of Machine Learning Research
Hi-index | 0.00 |
In this paper we introduce the MDP-A model for addressing a particular sub-class of non-stationary environments where the learner is required to interact with other agents. The behavior-policies of the agents are determined by a latent variable that changes rarely, but can modify the agent policies drastically when it does change (like traffic conditions in a driving problem). This unpredictable change in the latent variable results in non-stationarity. We frame this problem as transfer learning in a particular subclass of MDPs, which we call MDPs-with-agents (MDP-A), where each task/MDP requires the learner to learn to interact with opponent agents with fixed policies. Across the tasks, the state and action space remains the same (and is known) but the agent-policies change. We transfer information from previous tasks to quickly infer the combined agent behavior policy in a new task after some limited initial exploration, and hence rapidly learn an optimal/near-optimal policy. We propose a transfer algorithm which given a collection of source behavior policies, eliminates the policies that do not apply in the new task in time polynomial in the relevant parameters using novel a statistical test. We also perform experiments in three interesting domains and show that our algorithm significantly outperforms relevant alternative algorithms.