Utility based Q-learning to facilitate cooperation in Prisoner's Dilemma games

Authors:
Koichi Moriyama
Affiliations:
The Institute of Scientific and Industrial Research, Osaka University 8-1, Mihogaoka, Ibaraki, Osaka, 567-0047, Japan E-mail: koichi@ai.sanken.osaka-u.ac.jp
Venue:
Web Intelligence and Agent Systems
Year:
2009

Citing 16
Cited 1

Technical Note: \cal Q-Learning

Machine Learning
The dynamics of reinforcement learning in cooperative multiagent systems

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Multiagent learning using a variable learning rate

Artificial Intelligence
Construction of a learning agent handling its rewards according to environmental situations

Proceedings of the first international joint conference on Autonomous agents and multiagent systems: part 3
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Prisoner's Dilemma

Prisoner's Dilemma
Friend-or-Foe Q-learning in General-Sum Games

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Implicit Negotiation in Repeated Games

ATAL '01 Revised Papers from the 8th International Workshop on Intelligent Agents VIII
Towards a pareto-optimal solution in general-sum games

AAMAS '03 Proceedings of the second international joint conference on Autonomous agents and multiagent systems
Nash q-learning for general-sum stochastic games

The Journal of Machine Learning Research
An intrinsic reward mechanism for efficient exploration

ICML '06 Proceedings of the 23rd international conference on Machine learning
Multiagent learning in adaptive dynamic systems

Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
Advice taking in multiagent reinforcement learning

Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
Satisficing and learning cooperation in the prisoner's dilemma

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 1
Constructing an autonomous agent with an interdependent heuristics

PRICAI'00 Proceedings of the 6th Pacific Rim international conference on Artificial intelligence
Learning pareto-optimal solutions in 2x2 conflict games

LAMAS'05 Proceedings of the First international conference on Learning and Adaption in Multi-Agent Systems

Evolving subjective utilities: Prisoner's Dilemma game examples

The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

This work deals with Q-learning in a multiagent environment. There are many multiagent Q-learning methods, and most of them aim to converge to a Nash equilibrium, which is not desirable in games like the Prisoner's Dilemma (PD). However, normal Q-learning agents that use a stochastic method in choosing actions to avoid local optima may yield mutual cooperation in a PD game. Although such mutual cooperation usually occurs singly, it can be facilitated if the Q-function of cooperation becomes larger than that of defection after the cooperation. This work derives a theorem on how many consecutive repetitions of mutual cooperation are needed to make the Q-function of cooperation larger than that of defection. In addition, from the perspective of the author's previous works that discriminate utilities from rewards and use utilities for learning in PD games, this work also derives a corollary on how much utility is necessary to make the Q-function larger by one-shot mutual cooperation.