The Nonstochastic Multiarmed Bandit Problem
SIAM Journal on Computing
The First International Trading Agent Competition: Autonomous Bidding Agents
Electronic Commerce Research
Approximating game-theoretic optimal strategies for full-scale poker
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
Strategy evaluation in extensive games with importance sampling
Proceedings of the 25th international conference on Machine learning
A Heuristic-Based Approach for a Betting Strategy in Texas Hold'em Poker
Proceedings of the 2008 conference on Tenth Scandinavian Conference on Artificial Intelligence: SCAI 2008
Learning a value analysis tool for agent evaluation
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Learning a value analysis tool for agent evaluation
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
The grand challenge of computer Go: Monte Carlo tree search and extensions
Communications of the ACM
Baseline: practical control variates for agent evaluation in zero-sum domains
Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Computers in Human Behavior
Hi-index | 0.02 |
Evaluating the performance of an agent or group of agents can be, by itself, a very challenging problem. The stochastic nature of the environment plus the stochastic nature of agents' decisions can result in estimates with intractably large variances This paper examines the problem of finding low variance estimates of agent performance. In particular, we assume that some agent-environment dynamics are known, such as the random outcome of drawing a card or rolling a die. Other dynamics are unknown, such as the reasoning of a human or other black-box agent. Using the known dynamics, we describe the complete set of all unbiased estimators, that is, for any possible unknown dynamics the estimate's expectation is always the agent's expected utility. Then, given a belief abcut the unknown dynamics, we identify the unbiased estimator with minimum variance. If the belief is correct our estimate is optimal, and if the belief is wrong it is at least unbiased. Finally, we apply our unbiased estimator to the game of poker, demonstrating dramatically reduced variance and faster evaluation.