Anticipatory Learning Classifier Systems and Factored Reinforcement Learning

  • Authors:
  • Olivier Sigaud;Martin V. Butz;Olga Kozlova;Christophe Meyer

  • Affiliations:
  • Institut des Systèmes Intelligents et de Robotique (ISIR), CNRS UMR 7222, Université Pierre et Marie Curie - Paris6, Paris, France F-75005;University of Würzburg, Würzburg, Germany 97070;Institut des Systèmes Intelligents et de Robotique (ISIR), CNRS UMR 7222, Université Pierre et Marie Curie - Paris6, Paris, France F-75005 and Thales Security Solutions & Services, Simul ...;Thales Security Solutions & Services, ThereSIS Research and Innovation Office, Palaiseau Cedex, France F91767

  • Venue:
  • Anticipatory Behavior in Adaptive Learning Systems
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Factored Reinforcement Learning (frl ) is a new technique to solve Factored Markov Decision Problems (fmdp s) when the structure of the problem is not known in advance. Like Anticipatory Learning Classifier Systems (alcs s), it is a model-based Reinforcement Learning approach that includes generalization mechanisms in the presence of a structured domain. In general, frl and alcs s are explicit, state-anticipatory approaches that learn generalized state transition models to improve system behavior based on model-based reinforcement learning techniques. In this contribution, we highlight the conceptual similarities and differences between frl and alcs s, focusing on the one hand on spiti , an instance of frl method, and on alcs s, macs and xacs , on the other hand. Though frl systems seem to benefit from a clearer theoretical grounding, an empirical comparison between spiti and xacs on two benchmark problems reveals that the latter scales much better than the former when some combination of state variables do not occur. Based on this finding, we discuss the mechanisms in xacs that result in the better scalability and propose importing these mechanisms into frl systems.