Learning heuristic functions for large state spaces

  • Authors:
  • Shahab Jabbari Arfaee;Sandra Zilles;Robert C. Holte

  • Affiliations:
  • University of Alberta, Department of Computing Science, Edmonton, Alberta, Canada T6G 2H8;University of Regina, Department of Computer Science, Regina, Saskatchewan, Canada S4S 0A2;University of Alberta, Department of Computing Science, Edmonton, Alberta, Canada T6G 2H8

  • Venue:
  • Artificial Intelligence
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We investigate the use of machine learning to create effective heuristics for search algorithms such as IDA^@? or heuristic-search planners such as FF. Our method aims to generate a sequence of heuristics from a given weak heuristic h"0 and a set of unsolved training instances using a bootstrapping procedure. The training instances that can be solved using h"0 provide training examples for a learning algorithm that produces a heuristic h"1 that is expected to be stronger than h"0. If h"0 is so weak that it cannot solve any of the given instances we use random walks backward from the goal state to create a sequence of successively more difficult training instances starting with ones that are guaranteed to be solvable by h"0. The bootstrap process is then repeated using h"i in lieu of h"i"-"1 until a sufficiently strong heuristic is produced. We test this method on the 24-sliding-tile puzzle, the 35-pancake puzzle, Rubik@?s Cube, and the 20-blocks world. In every case our method produces a heuristic that allows IDA^@? to solve randomly generated problem instances quickly with solutions close to optimal. The total time for the bootstrap process to create strong heuristics for these large state spaces is on the order of days. To make the process effective when only a single problem instance needs to be solved, we present a variation in which the bootstrap learning of new heuristics is interleaved with problem-solving using the initial heuristic and whatever heuristics have been learned so far. This substantially reduces the total time needed to solve a single instance, while the solutions obtained are still close to optimal.