Baseline: practical control variates for agent evaluation in zero-sum domains

  • Authors:
  • Josh Davidson;Christopher Archibald;Michael Bowling

  • Affiliations:
  • University of Alberta, Edmonton, AB, Canada;University of Alberta, Edmonton, AB, Canada;University of Alberta, Edmonton, AB, Canada

  • Venue:
  • Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Agent evaluation in stochastic domains can be difficult. The commonplace approach of Monte Carlo evaluation can involve a prohibitive number of simulations when the variance of the outcome is high. In such domains, variance reduction techniques are necessary, but these techniques require careful encoding of domain knowledge. This paper introduces baseline as a simple approach to creating low variance estimators for zero-sum multi-agent domains with high outcome variance. The baseline method leverages the self play of any available agent to produce a control variate for variance reduction, subverting any extra complexity inherent with traditional approaches. The baseline method is also applicable in situations where existing techniques either require extensive implementation overhead or simply cannot be applied. Experimental variance reduction results are shown for both cases using the baseline method. Baseline is shown to surpass state-of-the-art techniques in three-player computer poker and is competitive in two-player computer poker games. Baseline also shows variance reduction in human poker and in a mock Ad Auction tournament from the Trading Agent Competition, domains where variance reduction methods are not typically employed.