Strategy evaluation in extensive games with importance sampling

Authors:
Michael Bowling;Michael Johanson;Neil Burch;Duane Szafron
Affiliations:
University of Alberta, Edmonton, Alberta, Canada;University of Alberta, Edmonton, Alberta, Canada;University of Alberta, Edmonton, Alberta, Canada;University of Alberta, Edmonton, Alberta, Canada
Venue:
Proceedings of the 25th international conference on Machine learning
Year:
2008

Citing 3
Cited 5

Optimal unbiased estimators for evaluating agent performance

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
A new algorithm for generating equilibria in massive zero-sum games

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Approximating game-theoretic optimal strategies for full-scale poker

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence

A demonstration of the Polaris poker system

Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Monte Carlo search applied to card selection in magic: the gathering

CIG'09 Proceedings of the 5th international conference on Computational Intelligence and Games
Safe opponent exploitation

Proceedings of the 13th ACM Conference on Electronic Commerce
Online implicit agent modelling

Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Do pokers players know how good they are? Accuracy of poker skill estimation in online and offline players

Computers in Human Behavior

Quantified Score

Hi-index	0.00

Visualization

Abstract

Typically agent evaluation is done through Monte Carlo estimation. However, stochastic agent decisions and stochastic outcomes can make this approach inefficient, requiring many samples for an accurate estimate. We present a new technique that can be used to simultaneously evaluate many strategies while playing a single strategy in the context of an extensive game. This technique is based on importance sampling, but utilizes two new mechanisms for significantly reducing variance in the estimates. We demonstrate its effectiveness in the domain of poker, where stochasticity makes traditional evaluation problematic.