Learning-Rate Adjusting Q-Learning for Two-Person Two-Action Symmetric Games

  • Authors:
  • Koichi Moriyama

  • Affiliations:
  • The Institute of Scientific and Industrial Research, Osaka University, Ibaraki, Osaka, Japan 567-0047

  • Venue:
  • KES-AMSTA '09 Proceedings of the Third KES International Symposium on Agent and Multi-Agent Systems: Technologies and Applications
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

There are many multiagent Q-learning methods, and most of them aim to converge to a Nash equilibrium that is not desirable in games like the Prisoner's Dilemma (PD). The author proposed the utility-based Q-learning (UB-Q) for PD that used utilities instead of rewards so as to maintain mutual cooperation once it had occurred. However, UB-Q has to know the payoffs of the game to calculate the utilities and works only in PD. Since a Q-learning agent's action depends on the relation of Q-values, the mutual cooperation can also be maintained by adjusting the learning rate. Thus, this paper deals with the learning rate directly and introduces another Q-learning method called the learning-rate adjusting Q-learning (LRA-Q). It calculates the learning rate from received payoffs and works in other kinds of two-person two-action symmetric games as well as PD. Numeric verification showed success of LRA-Q, but, it also revealed a side-effect.