Planning against fictitious players in repeated normal form games

Authors:
Enrique Munoz de Cote;Nicholas R. Jennings
Affiliations:
University of Southampton, Southampton;University of Southampton, Southampton
Venue:
Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Year:
2010

Citing 6
Cited 2

Technical Note: \cal Q-Learning

Machine Learning
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Implicit Negotiation in Repeated Games

ATAL '01 Revised Papers from the 8th International Workshop on Intelligent Agents VIII
Efficient learning of multi-step best response

Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems
Cooperative Multi-Agent Learning: The State of the Art

Autonomous Agents and Multi-Agent Systems
Social reward shaping in the prisoner's dilemma

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 3

EA2: The Winning Strategy for the Inaugural Lemonade Stand Game Tournament

Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Ishikawa play

Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Planning how to interact against bounded memory and unbounded memory learning opponents needs different treatment. Thus far, however, work in this area has shown how to design plans against bounded memory learning opponents, but no work has dealt with the unbounded memory case. This paper tackles this gap. In particular, we frame this as a planning problem using the framework of repeated matrix games, where the planner's objective is to compute the best exploiting sequence of actions against a learning opponent. The particular class of opponent we study uses a fictitious play process to update her beliefs, but the analysis generalizes to many forms of Bayesian learning agents. Our analysis is inspired by Banerjee and Peng's AIM framework, which works for planning and learning against bounded memory opponents (e.g an adaptive player). Building on this, we show how an unbounded memory opponent (specifically a fictitious player) can also be modelled as a finite MDP and present a new efficient algorithm that can find a way to exploit the opponent by computing in polynomial time a sequence of play that can obtain a higher average reward than those obtained by playing a game theoretic (Nash or correlated) equilibrium.