Optimized look-ahead tree search policies

Authors:
Francis Maes;Louis Wehenkel;Damien Ernst
Affiliations:
Dept. of Electrical Engineering and Computer Science, Institut Montefiore, University of Liège, Liège, Belgium;Dept. of Electrical Engineering and Computer Science, Institut Montefiore, University of Liège, Liège, Belgium;Dept. of Electrical Engineering and Computer Science, Institut Montefiore, University of Liège, Liège, Belgium
Venue:
EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Year:
2011

Citing 4
Cited 0

Real-time heuristic search

Artificial Intelligence
Machine Learning Methods for Planning

Machine Learning Methods for Planning
Optimistic Planning of Deterministic Systems

Recent Advances in Reinforcement Learning
Learning to act using real-time dynamic programming

Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider in this paper look-ahead tree techniques for the discrete-time control of a deterministic dynamical system so as to maximize a sum of discounted rewards over an infinite time horizon. Given the current system state x t at time t, these techniques explore the look-ahead tree representing possible evolutions of the system states and rewards conditioned on subsequent actions u t , u t +1 , …. When the computing budget is exhausted, they output the action u t that led to the best found sequence of discounted rewards. In this context, we are interested in computing good strategies for exploring the look-ahead tree. We propose a generic approach that looks for such strategies by solving an optimization problem whose objective is to compute a (budget compliant) tree-exploration strategy yielding a control policy maximizing the average return over a postulated set of initial states. This generic approach is fully specified to the case where the space of candidate tree-exploration strategies are "best-first" strategies parameterized by a linear combination of look-ahead path features --- some of them having been advocated in the literature before --- and where the optimization problem is solved by using an EDA-algorithm based on Gaussian distributions. Numerical experiments carried out on a model of the treatment of the HIV infection show that the optimized tree-exploration strategy is orders of magnitudes better than the previously advocated ones.