Learning-Rate Adjusting Q-Learning for Prisoner's Dilemma Games

Authors:
Koichi Moriyama
Affiliations:
-
Venue:
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 02
Year:
2008

Citing 6
Cited 1

Technical Note: \cal Q-Learning

Machine Learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Prisoner's Dilemma

Prisoner's Dilemma
Friend-or-Foe Q-learning in General-Sum Games

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Nash q-learning for general-sum stochastic games

The Journal of Machine Learning Research
Utility Based Q-learning to Maintain Cooperation in Prisoner's Dilemma Games

IAT '07 Proceedings of the 2007 IEEE/WIC/ACM International Conference on Intelligent Agent Technology

Learning to achieve socially optimal solutions in general-sum games

PRICAI'12 Proceedings of the 12th Pacific Rim international conference on Trends in Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many multiagent Q-learning algorithms have been proposed to date, and most of them aim to converge to a Nash equilibrium, which is not desirable in games like the Prisoner's Dilemma (PD). In the previous paper, the author proposed the utility-based Q-learning for PD, which used utilities as rewards in order to maintain mutual cooperation once it had occurred. However, since the agent's action depends on the relation of Q-values the agent has, the mutual cooperation can be maintained by adjusting the learning rate of Q-learning. Thus, in this paper, we deal with the learning rate directly and introduce a new Q-learning method called the learning-rate adjusting Q-learning, or LRA-Q.