Practical Issues in Temporal Difference Learning
Machine Learning
Co-Evolution in the Successful Learning of Backgammon Strategy
Machine Learning
Machines that learn to play games
Beyond Samuel: evolving a nearly expert checkers player
Advances in evolutionary computing
Autonomous Agents and Multi-Agent Systems
Hi-index | 0.00 |
The results obtained by Pollack and Blair substantiallyunderperform my 1992 TD Learning results. This is shown bydirectly benchmarking the 1992 TD nets against Pubeval. Aplausible hypothesis for this underperformance is that, unlike TDlearning, the hillclimbing algorithm fails to capture nonlinearstructure inherent in the problem, and despite the presence ofhidden units, only obtains a linear approximation to the optimalpolicy for backgammon. Two lines of evidence supporting thishypothesis are discussed, the first coming from the structure ofthe Pubeval benchmark program, and the second coming fromexperiments replicating the Pollack and Blair results.