Comments on “Co-Evolution in the Successful Learning of Backgammon Strategy”

  • Authors:
  • Gerald Tesauro

  • Affiliations:
  • IBM T.J. Watson Research Center, P.O. Box 704, Yorktown Heights, NY 10598. E-mail: Email: tesauro@watson.ibm.com

  • Venue:
  • Machine Learning
  • Year:
  • 1998

Quantified Score

Hi-index 0.00

Visualization

Abstract

The results obtained by Pollack and Blair substantiallyunderperform my 1992 TD Learning results. This is shown bydirectly benchmarking the 1992 TD nets against Pubeval. Aplausible hypothesis for this underperformance is that, unlike TDlearning, the hillclimbing algorithm fails to capture nonlinearstructure inherent in the problem, and despite the presence ofhidden units, only obtains a linear approximation to the optimalpolicy for backgammon. Two lines of evidence supporting thishypothesis are discussed, the first coming from the structure ofthe Pubeval benchmark program, and the second coming fromexperiments replicating the Pollack and Blair results.