Chess Neighborhoods, Function Combination, and Reinforcement Learning

Authors:
Robert Levinson;Ryan Weber
Affiliations:
-;-
Venue:
CG '00 Revised Papers from the Second International Conference on Computers and Games
Year:
2000

Citing 12
Cited 1

Heuristics: intelligent search strategies for computer problem solving

Heuristics: intelligent search strategies for computer problem solving
Adaptive signal processing

Adaptive signal processing
Practical Issues in Temporal Difference Learning

Machine Learning
A game-learning machine

A game-learning machine
Temporal difference learning and TD-Gammon

Communications of the ACM
On-line learning of linear functions

Computational Complexity
Linear least-squares algorithms for temporal difference learning

Machine Learning - Special issue on reinforcement learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Learning to Predict by the Methods of Temporal Differences

Machine Learning
KnightCap: A Chess Programm That Learns by Combining TD(lambda) with Game-Tree Search

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
First Results from Using Temporal Difference Learning in Shogi

CG '98 Proceedings of the First International Conference on Computers and Games
Some studies in machine learning using the game of checkers

IBM Journal of Research and Development

Design with shape grammars and reinforcement learning

Advanced Engineering Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Over the years, various research projects have attempted to develop a chess program that learns to play well given little prior knowledge beyond the rules of the game. Early on it was recognized that the key would be to adequately represent the relationships between the pieces and to evaluate the strengths or weaknesses of such relationships. As such, representations have developed, including a graph-based model. In this paper we extend the work on graph representation to a precise type of graph that we call a piece or square neighborhood. Specifically, a chessboard is represented as 64 neighborhoods, one for each square. Each neighborhood has a center, and 16 satellites corresponding to the pieces that are immediately close on the 4 diagonals, 2 ranks, 2 files, and 8 knight moves related to the square. Games are played and training values for boards are developed using temporal difference learning, as in other reinforcement learning systems. We then use a 2-layer regression network to learn. At the lower level the values (expected probability of winning) of the neighborhoods are learned and at the top they are combined based on their product and entropy. We report on relevant experiments including a learning experience on the Internet Chess Club (ICC) from which we can estimate a rating for the new program. The level of chess play achieved in a few days of training is comparable to a few months of work on previous systems such as Morph which is described as "one of the best from-scratch game learning systems, perhaps the best" [22].