Optimized look-ahead tree search policies

  • Authors:
  • Francis Maes;Louis Wehenkel;Damien Ernst

  • Affiliations:
  • Dept. of Electrical Engineering and Computer Science, Institut Montefiore, University of Liège, Liège, Belgium;Dept. of Electrical Engineering and Computer Science, Institut Montefiore, University of Liège, Liège, Belgium;Dept. of Electrical Engineering and Computer Science, Institut Montefiore, University of Liège, Liège, Belgium

  • Venue:
  • EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider in this paper look-ahead tree techniques for the discrete-time control of a deterministic dynamical system so as to maximize a sum of discounted rewards over an infinite time horizon. Given the current system state x t at time t, these techniques explore the look-ahead tree representing possible evolutions of the system states and rewards conditioned on subsequent actions u t , u t +1 , …. When the computing budget is exhausted, they output the action u t that led to the best found sequence of discounted rewards. In this context, we are interested in computing good strategies for exploring the look-ahead tree. We propose a generic approach that looks for such strategies by solving an optimization problem whose objective is to compute a (budget compliant) tree-exploration strategy yielding a control policy maximizing the average return over a postulated set of initial states. This generic approach is fully specified to the case where the space of candidate tree-exploration strategies are "best-first" strategies parameterized by a linear combination of look-ahead path features --- some of them having been advocated in the literature before --- and where the optimization problem is solved by using an EDA-algorithm based on Gaussian distributions. Numerical experiments carried out on a model of the treatment of the HIV infection show that the optimized tree-exploration strategy is orders of magnitudes better than the previously advocated ones.