Strategic Foresighted Learning in Competitive Multi-Agent Games

Authors:
P. J. 't Hoen;S. M. Bohte;J. A. La Poutré
Affiliations:
CWI, The Netherlands Centre for Mathematics and Computer Science, email: hoen@cwi.nl, sbohte@cwi.nl, hlp@cwi.nl;CWI, The Netherlands Centre for Mathematics and Computer Science, email: hoen@cwi.nl, sbohte@cwi.nl, hlp@cwi.nl;CWI, The Netherlands Centre for Mathematics and Computer Science, email: hoen@cwi.nl, sbohte@cwi.nl, hlp@cwi.nl
Venue:
Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
Year:
2006

Citing 5
Cited 0

Online learning about other agents in a dynamic multiagent system

AGENTS '98 Proceedings of the second international conference on Autonomous agents
An Experimental Study of N-Person Iterated Prisoner's Dilemma Games

AI '93/AI '94 Selected papers from the AI'93 and AI'94 Workshops on Evolutionary Computation, Process in Evolutionary Computation
A polynomial-time nash equilibrium algorithm for repeated games

Proceedings of the 4th ACM conference on Electronic commerce
Learning to compete, compromise, and cooperate in repeated general-sum games

ICML '05 Proceedings of the 22nd international conference on Machine learning
Learning from induced changes in opponent (re)actions in multi-agent games

AAMAS '06 Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe a generalized Q-learning type algorithm for reinforcement learning in competitive multi-agent games. We make the observation that in a competitive setting with adaptive agents an agent's actions will (likely) result in changes in the opponents policies. In addition to accounting for the estimated policies of the opponents, our algorithm also adjusts these future opponent policies by incorporating estimates of how the opponents change their policy as a reaction to ones own actions. We present results showing that agents that learn with this algorithm can successfully achieve high reward in competitive multi-agent games where myopic self-interested behavior conflicts with the long term individual interests of the players. We show that this approach successfully scales for multi-agent games of various sizes, in particular to the social dilemma type problems: from the small iterated Prisoner's Dilemma, to larger settings akin to Harding's Tragedy of the Commons. Thus, our multi-agent reinforcement algorithm is foresighted enough to correctly anticipate future rewards in the important problem class of social dilemmas, without having to resort to negotiation-like protocols or precoded strategies.