Co-evolving a Neural-Net Evaluation Function for Othello by Combining Genetic Algorithms and Reinforcement Learning

Authors:
Joshua A. Singer
Affiliations:
-
Venue:
ICCS '01 Proceedings of the International Conference on Computational Science-Part II
Year:
2001

Citing 7
Cited 2

Temporal difference learning and TD-Gammon

Communications of the ACM
Co-Evolution in the Successful Learning of Backgammon Strategy

Machine Learning
Reinforcement Learning

Reinforcement Learning
Pattern Recognition and Neural Networks

Pattern Recognition and Neural Networks
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Crossover, Macromutationand, and Population-Based Search

Proceedings of the 6th International Conference on Genetic Algorithms
Proceedings of the 1st annual conference on Genetic and evolutionary computation

GECCO '96 Proceedings of the 1st annual conference on Genetic and evolutionary computation

Coevolutionary temporal difference learning for Othello

CIG'09 Proceedings of the 5th international conference on Computational Intelligence and Games
Evolving small-board Go players using coevolutionary temporal difference learning with archives

International Journal of Applied Mathematics and Computer Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

The neural network has been used extensively as a vehicle for both genetic algorithms and reinforcement learning. This paper shows a natural way to combine the two methods and suggests that reinforcement learning may be superior to random mutation as an engine for the discovery of useful substructures. The paper also describes a software experiment that applies this technique to produce an Othello-playing computer program. The experiment subjects a pool of Othello-playing programs to a regime of successive adaptation cycles, where each cycle consists of an evolutionary phase, based on the genetic algorithm, followed by a learning phase, based on reinforcement learning. A key idea of the genetic implementation is the concept of feature-level crossover. The regime was run for three months through 900,000 individual matches of Othello. It ultimately yielded a program that is competitive with a human-designed Othello-program that plays at roughly intermediate level.