QL2, a simple reinforcement learning scheme for two-player zero-sum Markov games

Authors:
Benoít Frénay;Marco Saerens
Affiliations:
Machine Learning Group, EPL/ELEC/DICE, Université catholique de Louvain, Place du Levant, 3 (office a.188.30), 1348 Louvain-la-Neuve, Belgium;Machine Learning Group, ESPO/LSM/ISYS, Université catholique de Louvain, Place des Doyens, 1 (office b.108), 1348 Louvain-la-Neuve, Belgium
Venue:
Neurocomputing
Year:
2009

Citing 22
Cited 0

Simulated annealing: theory and applications

Simulated annealing: theory and applications
Linear programming and network flows (2nd ed.)

Linear programming and network flows (2nd ed.)
Training stochastic model recognition algorithms as networks can lead to maximum mutual information estimation of parameters

Advances in neural information processing systems 2
Technical Note: \cal Q-Learning

Machine Learning
Temporal difference learning and TD-Gammon

Communications of the ACM
Competitive Markov decision processes

Competitive Markov decision processes
Multi-agent reinforcement learning: independent vs. cooperative agents

Readings in agents
Neural Networks: A Comprehensive Foundation

Neural Networks: A Comprehensive Foundation
Discrete-Time Stochastic Systems: Estimation and Control

Discrete-Time Stochastic Systems: Estimation and Control
Dynamic Programming and Stochastic Control

Dynamic Programming and Stochastic Control
Machine Learning

Machine Learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Friend-or-Foe Q-learning in General-Sum Games

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Artificial Intelligence: A Modern Approach

Artificial Intelligence: A Modern Approach
Introduction to Stochastic Search and Optimization

Introduction to Stochastic Search and Optimization
R-max - a general polynomial time algorithm for near-optimal reinforcement learning

The Journal of Machine Learning Research
Reinforcement learning: a survey

Journal of Artificial Intelligence Research
A reinforcement learning approach to job-shop scheduling

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Value function approximation in zero-sum markov games

UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
Optimal tuning of continual online exploration in reinforcement learning

ICANN'06 Proceedings of the 16th international conference on Artificial Neural Networks - Volume Part I
Learning to trade via direct reinforcement

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.01

Visualization

Abstract

Markov games is a framework which can be used to formalise n-agent reinforcement learning (RL). Littman (Markov games as a framework for multi-agent reinforcement learning, in: Proceedings of the 11th International Conference on Machine Learning (ICML-94), 1994.) uses this framework to model two-agent zero-sum problems and, within this context, proposes the minimax-Q algorithm. This paper reviews RL algorithms for two-player zero-sum Markov games and introduces a new, simple, fast, algorithm, called QL"2. QL"2 is compared to several standard algorithms (Q-learning, Minimax and minimax-Q) implemented with the Qash library written in Python. The experiments show that QL"2 converges empirically to optimal mixed policies, as minimax-Q, but uses a surprisingly simple and cheap updating rule.