Cognitive policy learner: biasing winning or losing strategies
The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Hi-index | 0.00 |
Distributed Task Assignment is a convenient abstraction for load-balancing applications, workflow systems or supply-chain management. The topological features of such task networks are far from random but instead resemble that of small-worlds and scale-free networks. The agent's interaction is accordingly prescribed by this network structure. Simulating decentralised optimisation algorithms using the mathematical framework of queueing theory, it has been shown that knowledge of a neighbour's queueing state facilitates the minimisation of the accrued delay in a network. Therefore benign agents that have the same neighbour can share their experience and collaborate in training the function approximator according to the SARSA(0) gradient-descent update rule. The function approximator resides on the target node and its learnt state-action value mapping is shared among all nodes connecting to it. This setting is evaluated empirically using SARSA(0) reinforcement learning with the standard $\epsilon$-greedy policy and the weighted policy learner. We show that under certain conditions this leads to improved system performance compared to individually trained function approximators.