Optimal unbiased estimators for evaluating agent performance

Authors:
Martin Zinkevich;Michael Bowling;Nolan Bard;Morgan Kan;Darse Billings
Affiliations:
Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada;Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada;Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada;Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada;Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada
Venue:
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Year:
2006

Citing 3
Cited 8

The Nonstochastic Multiarmed Bandit Problem

SIAM Journal on Computing
The First International Trading Agent Competition: Autonomous Bidding Agents

Electronic Commerce Research
Approximating game-theoretic optimal strategies for full-scale poker

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence

Better automated abstraction techniques for imperfect information games, with application to Texas Hold'em poker

Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
Strategy evaluation in extensive games with importance sampling

Proceedings of the 25th international conference on Machine learning
A Heuristic-Based Approach for a Betting Strategy in Texas Hold'em Poker

Proceedings of the 2008 conference on Tenth Scandinavian Conference on Artificial Intelligence: SCAI 2008
Learning a value analysis tool for agent evaluation

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Learning a value analysis tool for agent evaluation

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
The grand challenge of computer Go: Monte Carlo tree search and extensions

Communications of the ACM
Baseline: practical control variates for agent evaluation in zero-sum domains

Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Do pokers players know how good they are? Accuracy of poker skill estimation in online and offline players

Computers in Human Behavior

Quantified Score

Hi-index	0.02

Visualization

Abstract

Evaluating the performance of an agent or group of agents can be, by itself, a very challenging problem. The stochastic nature of the environment plus the stochastic nature of agents' decisions can result in estimates with intractably large variances This paper examines the problem of finding low variance estimates of agent performance. In particular, we assume that some agent-environment dynamics are known, such as the random outcome of drawing a card or rolling a die. Other dynamics are unknown, such as the reasoning of a human or other black-box agent. Using the known dynamics, we describe the complete set of all unbiased estimators, that is, for any possible unknown dynamics the estimate's expectation is always the agent's expected utility. Then, given a belief abcut the unknown dynamics, we identify the unbiased estimator with minimum variance. If the belief is correct our estimate is optimal, and if the belief is wrong it is at least unbiased. Finally, we apply our unbiased estimator to the game of poker, demonstrating dramatically reduced variance and faster evaluation.