Counter example for Q-bucket-brigade under prediction problem

Authors:
Atsushi Wada;Keiki Takadama;Katsunori Shimohara
Affiliations:
National Institute of Information and Communication Technology and ATR Cognitive Information Science Laboratories, Kyoto, Japan;Department of Human Communication, The University of Electro-Communications, Chofushi, Tokyo, Japan;Faculty of Engineering, Doshisha University, Kyotanabe, Kyoto, Japan
Venue:
IWLCS'03-05 Proceedings of the 2003-2005 international conference on Learning classifier systems
Year:
2007

Citing 14
Cited 0

Efficient learning and planning within the Dyna framework

Adaptive Behavior
TD(λ) Converges with Probability 1

Machine Learning
A comparison of Q-learning and classifier systems

SAB94 Proceedings of the third international conference on Simulation of adaptive behavior : from animals to animats 3: from animals to animats 3
Asynchronous Stochastic Approximation and Q-Learning

Machine Learning
Convergence Results for Single-Step On-PolicyReinforcement-Learning Algorithms

Machine Learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Convergence of synchronous reinforcement learning with linear function approximation

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Adaptive value function approximations in classifier systems

GECCO '05 Proceedings of the 7th annual workshop on Genetic and evolutionary computation
Zcs: A zeroth level classifier system

Evolutionary Computation
Classifier fitness based on accuracy

Evolutionary Computation
On the convergence of stochastic iterative dynamic programming algorithms

Neural Computation
Toward a theory of generalization and learning in XCS

IEEE Transactions on Evolutionary Computation
Gradient descent methods in learning classifier systems: improving XCS performance in multistep problems

IEEE Transactions on Evolutionary Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Aiming at clarifying the convergence or divergence conditions for Learning Classifier System (LCS), this paper explores: (1) an extreme condition where the reinforcement process of LCS diverges; and (2) methods to avoid such divergence. Based on our previous work that showed equivalence between LCS's reinforcement process and Reinforcement Learning (RL) with Function approximation (FA) method, we present a counter example for LCS with the Q-bucket-brigade based on the 11-state star problem, a counter example originally proposed to show the divergence of Q-learning with linear FA. Furthermore, the empirical results applying the counter example to LCS verified the results predicted from the theory: (1) LCS with the Q-bucket-brigade diverged under prediction problems, where the action selection policy was fixed; and (2) such divergence was avoided by using the implicit-bucket-brigade or applying residual gradient algorithm to the Q-bucket-brigade.