Safe opponent exploitation

Authors:
Samuel Ganzfried;Tuomas Sandholm
Affiliations:
Carnegie Mellon University, Pittsburgh, PA, USA;Carnegie Mellon University, Pittsburgh, PA, USA
Venue:
Proceedings of the 13th ACM Conference on Electronic Commerce
Year:
2012

Citing 10
Cited 1

Fast algorithms for finding randomized strategies in game trees

STOC '94 Proceedings of the twenty-sixth annual ACM symposium on Theory of computing
Representations and solutions for game-theoretic problems

Artificial Intelligence - Special issue on economic principles of multi-agent systems
The Nonstochastic Multiarmed Bandit Problem

SIAM Journal on Computing
A general criterion and an algorithmic framework for learning in multi-agent systems

Machine Learning
Perspectives on multiagent learning

Artificial Intelligence
Strategy evaluation in extensive games with importance sampling

Proceedings of the 25th international conference on Machine learning
Effective short-term opponent exploitation in simplified poker

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Smoothing Techniques for Computing Nash Equilibria of Sequential Games

Mathematics of Operations Research
Computing equilibria by incorporating qualitative models?

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Game theory-based opponent modeling in large imperfect-information games

The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 2

Online implicit agent modelling

Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the problem of playing a finitely-repeated two-player zero-sum game safely---that is, guaranteeing at least the value of the game per period in expectation regardless of the strategy used by the opponent. Playing a stage-game equilibrium strategy at each time step clearly guarantees safety, and prior work has conjectured that it is impossible to simultaneously deviate from a stage-game equilibrium (in hope of exploiting a suboptimal opponent) and to guarantee safety. We show that such profitable deviations are indeed possible---specifically, in games where certain types of 'gift' strategies exist, which we define formally. We show that the set of strategies constituting such gifts can be strictly larger than the set of iteratively weakly-dominated strategies; this disproves another recent conjecture which states that all non-iteratively-weakly-dominated strategies are best responses to each equilibrium strategy of the other player. We present a full characterization of safe strategies, and develop efficient algorithms for exploiting suboptimal opponents while guaranteeing safety. We also provide analogous results for sequential perfect and imperfect-information games, and present safe exploitation algorithms and full characterizations of safe strategies for those settings as well. We present experimental results in Kuhn poker, a canonical test problem for game-theoretic algorithms. Our experiments show that 1) aggressive safe exploitation strategies significantly outperform adjusting the exploitation within equilibrium strategies and 2) all the safe exploitation strategies significantly outperform a (non-safe) best response strategy against strong dynamic opponents.