Utility Based Q-learning to Maintain Cooperation in Prisoner's Dilemma Games

Authors:
Koichi Moriyama
Affiliations:
-
Venue:
IAT '07 Proceedings of the 2007 IEEE/WIC/ACM International Conference on Intelligent Agent Technology
Year:
2007

Citing 0
Cited 2

Learning-Rate Adjusting Q-Learning for Prisoner's Dilemma Games

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 02
Learning-Rate Adjusting Q-Learning for Two-Person Two-Action Symmetric Games

KES-AMSTA '09 Proceedings of the Third KES International Symposium on Agent and Multi-Agent Systems: Technologies and Applications

Quantified Score

Hi-index	0.01

Visualization

Abstract

This work deals with Q-learning in a multiagent environment. There are many multiagent Q-learning methods, and most of them aim to converge to a Nash equilibrium, which is not desirable in games like the Prisoner's Dilemma (PD). However, normal Q-learning agents that use a stochastic method in choosing actions to avoid local optima may bring mutual cooperation in PD. Although such mutual cooperation usually occurs singly, it can be maintained if the Qfunction of cooperation becomes larger than that of defection after the cooperation. This work derives a theorem on how many times the cooperation is needed to make the Qfunction of cooperation larger than that of defection. In addition, from the perspective of the author's previous works that discriminate utilities from rewards and use utilities for learning in PD, this work also derives a corollary on how much utility is necessary to make the Q-function larger by one-shot mutual cooperation.