Counter example for Q-bucket-brigade under prediction problem

  • Authors:
  • Atsushi Wada;Keiki Takadama;Katsunori Shimohara

  • Affiliations:
  • National Institute of Information and Communication Technology and ATR Cognitive Information Science Laboratories, Kyoto, Japan;Department of Human Communication, The University of Electro-Communications, Chofushi, Tokyo, Japan;Faculty of Engineering, Doshisha University, Kyotanabe, Kyoto, Japan

  • Venue:
  • IWLCS'03-05 Proceedings of the 2003-2005 international conference on Learning classifier systems
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Aiming at clarifying the convergence or divergence conditions for Learning Classifier System (LCS), this paper explores: (1) an extreme condition where the reinforcement process of LCS diverges; and (2) methods to avoid such divergence. Based on our previous work that showed equivalence between LCS's reinforcement process and Reinforcement Learning (RL) with Function approximation (FA) method, we present a counter example for LCS with the Q-bucket-brigade based on the 11-state star problem, a counter example originally proposed to show the divergence of Q-learning with linear FA. Furthermore, the empirical results applying the counter example to LCS verified the results predicted from the theory: (1) LCS with the Q-bucket-brigade diverged under prediction problems, where the action selection policy was fixed; and (2) such divergence was avoided by using the implicit-bucket-brigade or applying residual gradient algorithm to the Q-bucket-brigade.