Counter example for Q-bucket-brigade under prediction problem

Authors:
Atsushi Wada;Keiki Takadama;Katsunori Shimohara
Affiliations:
ATR NIS, Hikaridai, Seika-cho, Soraku-gun, Kyoto, Japan;Tokyo Institute of Technology, Nagatsuta-cho, Midori-ku, Kanagawa, Japan;ATR NIS, Hikaridai, Seika-cho, Soraku-gun, Kyoto, Japan
Venue:
GECCO '05 Proceedings of the 7th annual workshop on Genetic and evolutionary computation
Year:
2005

Citing 4
Cited 0

Feature-based methods for large scale dynamic programming

Machine Learning - Special issue on reinforcement learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Classifier fitness based on accuracy

Evolutionary Computation
Toward a theory of generalization and learning in XCS

IEEE Transactions on Evolutionary Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Aiming to clarify the convergence or divergence conditions for Learning Classifier System (LCS), this paper explores: (1) an extreme condition where the reinforcement process of LCS diverges; and (2) methods to avoid such divergence. Based on our previous work that showed equivalence between LCS's reinforcement process and Reinforcement Learning (RL) with Function approximation (FA) method, we present a counter-example for LCS with Q-bucket-brigade based on the 11-state star problem, a counter-example originally proposed to show the divergence of Q-learning with linear FA. Furthermore, the empirical results applying the counter-example to LCS verified the results predicted from the theory: (1) LCS with Q-bucket-brigade diverged under the prediction problem, where the action selection policy was fixed; and (2) such divergence was avoided by using implicit-bucket-brigade or applying residual gradient algorithm to Q-bucket-brigade.