Rewards for pairs of Q-learning agents conducive to turn-taking in medium-access games

  • Authors:
  • Peter A Raffensperger;Philip J Bones;Allan I Mcinnes;Russell Y Webb

  • Affiliations:
  • Department of Electrical and Computer Engineering, University of Canterbury, New Zealand;Department of Electrical and Computer Engineering, University of Canterbury, New Zealand;Department of Electrical and Computer Engineering, University of Canterbury, New Zealand;Apple Computer Inc., Cupertino, California, USA

  • Venue:
  • Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe a class of stateful games, which we call 'medium-access games', as a model for human and machine communication and demonstrate how to use the Nash equilibria of those games as played by pairs of agents with stationary policies to predict turn-taking behaviour in Q-learning agents based on the agents' reward function. We identify which fixed policies exhibit turn-taking behaviour in medium-access games and show how to compute the Nash equilibria of such games by using Markov chain methods to calculate the agents' expected rewards for different stationary policies. We present simulation results for an extensive range of reward functions for pairs of Q-learners playing medium-access games and we use our analysis for stationary agents to develop predictors for the emergence of turn-taking. We explain how to use our predictors to design reward functions for pairs of Q-learning agents that are conducive (or prohibitive) to the emergence of turn-taking in medium-access games. We focus on designing multi-agent reinforcement learning systems that deliberately produce coordinated turn-taking but we also intend our results to be useful for analysing emergent turn-taking behaviour. Based on our turn-taking related results, we suggest ways to use our methodology to designs rewards for quantifiable behaviours besides turn-taking.