Efficient learning and planning within the Dyna framework
Adaptive Behavior
TD(λ) Converges with Probability 1
Machine Learning
A comparison of Q-learning and classifier systems
SAB94 Proceedings of the third international conference on Simulation of adaptive behavior : from animals to animats 3: from animals to animats 3
Asynchronous Stochastic Approximation and Q-Learning
Machine Learning
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Learning to Predict by the Methods of Temporal Differences
Machine Learning
Convergence of synchronous reinforcement learning with linear function approximation
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Adaptive value function approximations in classifier systems
GECCO '05 Proceedings of the 7th annual workshop on Genetic and evolutionary computation
Zcs: A zeroth level classifier system
Evolutionary Computation
Classifier fitness based on accuracy
Evolutionary Computation
On the convergence of stochastic iterative dynamic programming algorithms
Neural Computation
Toward a theory of generalization and learning in XCS
IEEE Transactions on Evolutionary Computation
IEEE Transactions on Evolutionary Computation
Hi-index | 0.00 |
Aiming at clarifying the convergence or divergence conditions for Learning Classifier System (LCS), this paper explores: (1) an extreme condition where the reinforcement process of LCS diverges; and (2) methods to avoid such divergence. Based on our previous work that showed equivalence between LCS's reinforcement process and Reinforcement Learning (RL) with Function approximation (FA) method, we present a counter example for LCS with the Q-bucket-brigade based on the 11-state star problem, a counter example originally proposed to show the divergence of Q-learning with linear FA. Furthermore, the empirical results applying the counter example to LCS verified the results predicted from the theory: (1) LCS with the Q-bucket-brigade diverged under prediction problems, where the action selection policy was fixed; and (2) such divergence was avoided by using the implicit-bucket-brigade or applying residual gradient algorithm to the Q-bucket-brigade.