Optimal unbiased estimators for evaluating agent performance

  • Authors:
  • Martin Zinkevich;Michael Bowling;Nolan Bard;Morgan Kan;Darse Billings

  • Affiliations:
  • Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada;Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada;Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada;Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada;Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada

  • Venue:
  • AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
  • Year:
  • 2006

Quantified Score

Hi-index 0.02

Visualization

Abstract

Evaluating the performance of an agent or group of agents can be, by itself, a very challenging problem. The stochastic nature of the environment plus the stochastic nature of agents' decisions can result in estimates with intractably large variances This paper examines the problem of finding low variance estimates of agent performance. In particular, we assume that some agent-environment dynamics are known, such as the random outcome of drawing a card or rolling a die. Other dynamics are unknown, such as the reasoning of a human or other black-box agent. Using the known dynamics, we describe the complete set of all unbiased estimators, that is, for any possible unknown dynamics the estimate's expectation is always the agent's expected utility. Then, given a belief abcut the unknown dynamics, we identify the unbiased estimator with minimum variance. If the belief is correct our estimate is optimal, and if the belief is wrong it is at least unbiased. Finally, we apply our unbiased estimator to the game of poker, demonstrating dramatically reduced variance and faster evaluation.