Fast algorithms for finding randomized strategies in game trees
STOC '94 Proceedings of the twenty-sixth annual ACM symposium on Theory of computing
Representations and solutions for game-theoretic problems
Artificial Intelligence - Special issue on economic principles of multi-agent systems
The Nonstochastic Multiarmed Bandit Problem
SIAM Journal on Computing
Perspectives on multiagent learning
Artificial Intelligence
Strategy evaluation in extensive games with importance sampling
Proceedings of the 25th international conference on Machine learning
Effective short-term opponent exploitation in simplified poker
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Smoothing Techniques for Computing Nash Equilibria of Sequential Games
Mathematics of Operations Research
Computing equilibria by incorporating qualitative models?
Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Game theory-based opponent modeling in large imperfect-information games
The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Online implicit agent modelling
Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Hi-index | 0.00 |
We consider the problem of playing a finitely-repeated two-player zero-sum game safely---that is, guaranteeing at least the value of the game per period in expectation regardless of the strategy used by the opponent. Playing a stage-game equilibrium strategy at each time step clearly guarantees safety, and prior work has conjectured that it is impossible to simultaneously deviate from a stage-game equilibrium (in hope of exploiting a suboptimal opponent) and to guarantee safety. We show that such profitable deviations are indeed possible---specifically, in games where certain types of 'gift' strategies exist, which we define formally. We show that the set of strategies constituting such gifts can be strictly larger than the set of iteratively weakly-dominated strategies; this disproves another recent conjecture which states that all non-iteratively-weakly-dominated strategies are best responses to each equilibrium strategy of the other player. We present a full characterization of safe strategies, and develop efficient algorithms for exploiting suboptimal opponents while guaranteeing safety. We also provide analogous results for sequential perfect and imperfect-information games, and present safe exploitation algorithms and full characterizations of safe strategies for those settings as well. We present experimental results in Kuhn poker, a canonical test problem for game-theoretic algorithms. Our experiments show that 1) aggressive safe exploitation strategies significantly outperform adjusting the exploitation within equilibrium strategies and 2) all the safe exploitation strategies significantly outperform a (non-safe) best response strategy against strong dynamic opponents.