Standard and averaging reinforcement learning in XCS

Authors:
Pier Luca Lanzi;Daniele Loiacono
Affiliations:
Politecnico di Milano, Milano, Italy and University of Illinois at Urbana Champaign, Urbana, IL;Politecnico di Milano, Milano, Italy
Venue:
Proceedings of the 8th annual conference on Genetic and evolutionary computation
Year:
2006

Citing 9
Cited 2

Adaptive switching circuits

Neurocomputing: foundations of research
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Kernel-Based Reinforcement Learning

Machine Learning
Classifiers that approximate functions

Natural Computing: an international journal
Off-Policy Temporal Difference Learning with Function Approximation

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Strength or Accuracy: Credit Assignment in Learning Classifier Systems

Strength or Accuracy: Credit Assignment in Learning Classifier Systems
Classifier fitness based on accuracy

Evolutionary Computation
An analysis of generalization in the xcs classifier system

Evolutionary Computation
Gradient descent methods in learning classifier systems: improving XCS performance in multistep problems

IEEE Transactions on Evolutionary Computation

Empirical analysis of generalization and learning in XCS with gradient descent

Proceedings of the 9th annual conference on Genetic and evolutionary computation
Learning classifier system with average reward reinforcement learning

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper investigates reinforcement learning (RL) in XCS. First, it formally shows that XCS implements a method of generalized RL based on linear approximators, in which the usual input mapping function translates the state-action space into a niche relative fitness space. Then, it shows that, although XCS has always been related to standard RL, XCS is actually a method of averaging RL. More precisely, XCS with gradient descent can be actually derived from the typical update of averaging RL. It is noted that the use of averaging RL in XCS introduces an intrinsic preference toward classifiers with a smaller fitness in the niche. It is argued that, because of the accuracy pressure in XCS, this results in an additional preference toward specificity. A very simple experiment is presented to support this hypothesis. The same approach is applied to XCS with computed prediction (XCSF) and similar conclusions are drawn.