Comments on “Co-Evolution in the Successful Learning of Backgammon Strategy”

Authors:
Gerald Tesauro
Affiliations:
IBM T.J. Watson Research Center, P.O. Box 704, Yorktown Heights, NY 10598. E-mail: Email: tesauro@watson.ibm.com
Venue:
Machine Learning
Year:
1998

Citing 2
Cited 3

Practical Issues in Temporal Difference Learning

Machine Learning
Co-Evolution in the Successful Learning of Backgammon Strategy

Machine Learning

Learning to play strong poker

Machines that learn to play games
Beyond Samuel: evolving a nearly expert checkers player

Advances in evolutionary computing
Critical factors in the empirical performance of temporal difference and evolutionary methods for reinforcement learning

Autonomous Agents and Multi-Agent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The results obtained by Pollack and Blair substantiallyunderperform my 1992 TD Learning results. This is shown bydirectly benchmarking the 1992 TD nets against Pubeval. Aplausible hypothesis for this underperformance is that, unlike TDlearning, the hillclimbing algorithm fails to capture nonlinearstructure inherent in the problem, and despite the presence ofhidden units, only obtains a linear approximation to the optimalpolicy for backgammon. Two lines of evidence supporting thishypothesis are discussed, the first coming from the structure ofthe Pubeval benchmark program, and the second coming fromexperiments replicating the Pollack and Blair results.