Co-Evolution in the Successful Learning of Backgammon Strategy

  • Authors:
  • Jordan B. Pollack;Alan D. Blair

  • Affiliations:
  • Computer Science Department, Volen Center for Complex Systems, Brandeis University, Waltham, MA 02254. E-mail: Email: pollack@cs.brandeis.edu;Computer Science Department, Volen Center for Complex Systems, Brandeis University, Waltham, MA 02254. E-mail: Email: blair@cs.uq.edu.au

  • Venue:
  • Machine Learning
  • Year:
  • 1998

Quantified Score

Hi-index 0.00

Visualization

Abstract

Following Tesauro‘s work on TD-Gammon, we used a 4,000 parameterfeedforward neural network to develop a competitive backgammonevaluation function. Play proceeds by a roll of the dice, applicationof the network to all legal moves, and selection of the position with the highest evaluation. However, no backpropagation,reinforcement or temporal difference learning methods were employed. Instead we applysimple hillclimbing in a relative fitness environment. We start withan initial champion of all zero weights and proceed simply by playingthe current champion network against a slightly mutated challenger andchanging weights if the challenger wins. Surprisingly, this workedrather well. We investigate how the peculiar dynamics of this domainenabled a previously discarded weak method to succeed, by preventingsuboptimal equilibria in a “meta-game” of self-learning.