Exploration of Multi-State Environments: Local Measures and Back-Propagation of Uncertainty

  • Authors:
  • Nicolas Meuleau;Paul Bourgine

  • Affiliations:
  • Computer Science Department, Box 1910, Brown University, Providence, RI 02912, USA. nm@cs.brown.edu;Ecole Polytechnique, CREA, Route de Saclay, F-91128 Palaiseau cedex, France. bourgine@poly.polytechnique.fr

  • Venue:
  • Machine Learning
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents an action selection technique forreinforcement learning in stationary Markovian environments. Thistechnique may be used in direct algorithms such as Q-learning, or inindirect algorithms such as adaptive dynamic programming. It is basedon two principles. The first is to define a local measure of theuncertainty using the theory of bandit problems. We show that such ameasure suffers from several drawbacks. In particular, a directapplication of it leads to algorithms of low quality that can beeasily misled by particular configurations of the environment. Thesecond basic principle was introduced to eliminate this drawback. Itconsists of assimilating the local measures of uncertainty torewards, and back-propagating them with the dynamic programming ortemporal difference mechanisms. This allows reproducing global-scalereasoning about the uncertainty, using only local measures of it.Numerical simulations clearly show the efficiency of thesepropositions.