Modeling non-stationary opponents

  • Authors:
  • Pablo Hernandez-Leal;Enrique Munoz de Cote;L. Enrique Sucar

  • Affiliations:
  • INAOE, Puebla, Mexico;INAOE, Puebla, Mexico;INAOE, Puebla, Mexico

  • Venue:
  • Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper studies repeated interactions between an agent and an unknown opponent that changes its strategy over time. We propose a framework for learning switching non-stationary strategies. The approach uses decision trees to learn the most up to date opponent's strategy. Then, the agent's strategy is computed by transforming the tree into a Markov Decision Process (MDP), whose solution dictates the optimal way of playing against the learned strategy. The agent's learnt model is continuously re-evaluated to assess strategy switches. Our method detects such strategy switches by measuring tree similarities, and reveals whether the opponent has changed its strategy and a new model has to be learned. We evaluated the proposed approach in the iterated prisoner's dilemma, outperforming common strategies against stationary and non-stationary opponents.