QL2, a simple reinforcement learning scheme for two-player zero-sum Markov games

  • Authors:
  • Benoít Frénay;Marco Saerens

  • Affiliations:
  • Machine Learning Group, EPL/ELEC/DICE, Université catholique de Louvain, Place du Levant, 3 (office a.188.30), 1348 Louvain-la-Neuve, Belgium;Machine Learning Group, ESPO/LSM/ISYS, Université catholique de Louvain, Place des Doyens, 1 (office b.108), 1348 Louvain-la-Neuve, Belgium

  • Venue:
  • Neurocomputing
  • Year:
  • 2009

Quantified Score

Hi-index 0.01

Visualization

Abstract

Markov games is a framework which can be used to formalise n-agent reinforcement learning (RL). Littman (Markov games as a framework for multi-agent reinforcement learning, in: Proceedings of the 11th International Conference on Machine Learning (ICML-94), 1994.) uses this framework to model two-agent zero-sum problems and, within this context, proposes the minimax-Q algorithm. This paper reviews RL algorithms for two-player zero-sum Markov games and introduces a new, simple, fast, algorithm, called QL"2. QL"2 is compared to several standard algorithms (Q-learning, Minimax and minimax-Q) implemented with the Qash library written in Python. The experiments show that QL"2 converges empirically to optimal mixed policies, as minimax-Q, but uses a surprisingly simple and cheap updating rule.