Efficient Nash equilibrium approximation through Monte Carlo counterfactual regret minimization

Authors:
Michael Johanson;Nolan Bard;Marc Lanctot;Richard Gibson;Michael Bowling
Affiliations:
University of Alberta, Edmonton, Alberta;University of Alberta, Edmonton, Alberta;University of Alberta, Edmonton, Alberta;University of Alberta, Edmonton, Alberta;University of Alberta, Edmonton, Alberta
Venue:
Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Year:
2012

Citing 4
Cited 2

Fast algorithms for finding randomized strategies in game trees

STOC '94 Proceedings of the twenty-sixth annual ACM symposium on Theory of computing
Potential-aware automated abstraction of sequential games, and holistic equilibrium analysis of Texas Hold'em poker

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Smoothing Techniques for Computing Nash Equilibria of Sequential Games

Mathematics of Operations Research
Accelerating best response calculation in large extensive games

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume One

Just add Pepper: extending learning algorithms for repeated matrix games to repeated Markov games

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Evaluating state-space abstractions in extensive-form games

Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recently, there has been considerable progress towards algorithms for approximating Nash equilibrium strategies in extensive games. One such algorithm, Counterfactual Regret Minimization (CFR), has proven to be effective in two-player zero-sum poker domains. While the basic algorithm is iterative and performs a full game traversal on each iteration, sampling based approaches are possible. For instance, chance-sampled CFR considers just a single chance outcome per traversal, resulting in faster but less precise iterations. While more iterations are required, chance-sampled CFR requires less time overall to converge. In this work, we present new sampling techniques that consider sets of chance outcomes during each traversal to produce slower, more accurate iterations. By sampling only the public chance outcomes seen by all players, we take advantage of the imperfect information structure of the game to (i) avoid recomputation of strategy probabilities, and (ii) achieve an algorithmic speed improvement, performing O(n2) work at terminal nodes in O(n) time. We demonstrate that this new CFR update converges more quickly than chance-sampled CFR in the large domains of poker and Bluff.