Simulated annealing: theory and applications
Simulated annealing: theory and applications
Linear programming and network flows (2nd ed.)
Linear programming and network flows (2nd ed.)
Advances in neural information processing systems 2
Technical Note: \cal Q-Learning
Machine Learning
Temporal difference learning and TD-Gammon
Communications of the ACM
Competitive Markov decision processes
Competitive Markov decision processes
Multi-agent reinforcement learning: independent vs. cooperative agents
Readings in agents
Neural Networks: A Comprehensive Foundation
Neural Networks: A Comprehensive Foundation
Discrete-Time Stochastic Systems: Estimation and Control
Discrete-Time Stochastic Systems: Estimation and Control
Dynamic Programming and Stochastic Control
Dynamic Programming and Stochastic Control
Machine Learning
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Neuro-Dynamic Programming
Friend-or-Foe Q-learning in General-Sum Games
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Artificial Intelligence: A Modern Approach
Artificial Intelligence: A Modern Approach
Introduction to Stochastic Search and Optimization
Introduction to Stochastic Search and Optimization
R-max - a general polynomial time algorithm for near-optimal reinforcement learning
The Journal of Machine Learning Research
Reinforcement learning: a survey
Journal of Artificial Intelligence Research
A reinforcement learning approach to job-shop scheduling
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Value function approximation in zero-sum markov games
UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
Optimal tuning of continual online exploration in reinforcement learning
ICANN'06 Proceedings of the 16th international conference on Artificial Neural Networks - Volume Part I
Learning to trade via direct reinforcement
IEEE Transactions on Neural Networks
Hi-index | 0.01 |
Markov games is a framework which can be used to formalise n-agent reinforcement learning (RL). Littman (Markov games as a framework for multi-agent reinforcement learning, in: Proceedings of the 11th International Conference on Machine Learning (ICML-94), 1994.) uses this framework to model two-agent zero-sum problems and, within this context, proposes the minimax-Q algorithm. This paper reviews RL algorithms for two-player zero-sum Markov games and introduces a new, simple, fast, algorithm, called QL"2. QL"2 is compared to several standard algorithms (Q-learning, Minimax and minimax-Q) implemented with the Qash library written in Python. The experiments show that QL"2 converges empirically to optimal mixed policies, as minimax-Q, but uses a surprisingly simple and cheap updating rule.