Explanation-Based Learning and Reinforcement Learning: A Unified View

  • Authors:
  • Thomas G. Dietterich;Nicholas S. Flann

  • Affiliations:
  • Department of Computer Science, Oregon State University, Corvallis, OR 97331-3202. E-mail: tgd@cs.orst.edu;Department of Computer Science, Utah State University, Logan, UT 84322-4205. E-mail: flann@nick.cs.usu.edu

  • Venue:
  • Machine Learning
  • Year:
  • 1997

Quantified Score

Hi-index 0.00

Visualization

Abstract

In speedup-learning problems, where full descriptions of operatorsare known, both explanation-based learning (EBL) and reinforcementlearning (RL) methods can be applied. This paper shows that bothmethods involve fundamentally the same process of propagatinginformation backward from the goal toward the starting state. MostRL methods perform this propagation on a state-by-state basis, whileEBL methods compute the weakest preconditions of operators, andhence, perform this propagation on a region-by-region basis. Barto,Bradtke, and Singh (1995) have observed thatmany algorithms for reinforcement learning can be viewed asasynchronous dynamic programming. Based on this observation, thispaper shows how to develop dynamic programming versions of EBL, whichwe call region-based dynamic programming or Explanation-BasedReinforcement Learning (EBRL). The paper compares batch andonline versions of EBRL to batch and online versions of point-baseddynamic programming and to standard EBL. The results show thatregion-based dynamic programming combines the strengths of EBL (fastlearning and the ability to scale to large state spaces) with thestrengths of reinforcement learning algorithms (learning of optimalpolicies). Results are shown in chess endgames and in synthetic mazetasks.