Feature construction for reinforcement learning in hearts

  • Authors:
  • Nathan R. Sturtevant;Adam M. White

  • Affiliations:
  • Department of Computing Science, University of Alberta, Edmonton, Canada;Department of Computing Science, University of Alberta, Edmonton, Canada

  • Venue:
  • CG'06 Proceedings of the 5th international conference on Computers and games
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Temporal difference (TD) learning has been used to learn strong evaluation functions in a variety of two-player games. TD-gammon illustrated how the combination of game tree search and learning methods can achieve grand-master level play in backgammon. In this work, we develop a player for the game of hearts, a 4-player game, based on stochastic linear regression and TD learning. Using a small set of basic game features we exhaustively combined features into a more expressive representation of the game state. We report initial results on learning with various combinations of features and training under self-play and against search-based players. Our simple learner was able to beat one of the best search-based hearts programs.