Limits in long path learning with XCS

  • Authors:
  • Alwyn Barry

  • Affiliations:
  • Department of Computer Science, University of Bath, Bath, UK

  • Venue:
  • GECCO'03 Proceedings of the 2003 international conference on Genetic and evolutionary computation: PartII
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

The development of the XCS Learning Classifier System [26] has produced a stable implementation, able to consistently identify the accurate and optimally general population of classifiers mapping a given reward landscape [15,16,29]. XCS is particularly powerful within direct-reward environments, and notably within problems suitable for commercial application [3]. The application of XCS within delayed reward environments has also shown promise, although early investigations were within enviroments with a comparatively short delay to reward (e.g. [28, 19]). Subsequent systematic investigation [19,20,1,2] have suggested that XCS has difficulty accurately mapping and exploiting even simple environments with moderate reward delays. This paper summarises these results and presents new results that identify some limits and their implications. A modification to the error computation within XCS is introduced that allows the minimum error parameter to be applied relative to the magnitude of the payoff to each classifier. First results demonstrate that this modification enables XCS to successfully map longer delayed-reward enviroments.