Is XCS Suitable For Problems with Temporal Rewards?

Authors:
Kai Wing TANG;Ray A. JARVIS
Affiliations:
Monash University, Victoria 3800, Australia;Monash University, Victoria 3800, Australia
Venue:
CIMCA '05 Proceedings of the International Conference on Computational Intelligence for Modelling, Control and Automation and International Conference on Intelligent Agents, Web Technologies and Internet Commerce Vol-2 (CIMCA-IAWTIC'06) - Volume 02
Year:
2005

Citing 0
Cited 1

Feedback of Delayed Rewards in XCS for Environments with Aliasing States

ACAL '09 Proceedings of the 4th Australian Conference on Artificial Life: Borrowing from Biology

Quantified Score

Hi-index	0.00

Visualization

Abstract

XCS [1], the accuracy-based classifier system, provides a very brilliant way to merge genetic algorithmic (GA) rule learning and reinforcement learning (RL) methodologies together. This makes it suitable for a wide range of applications where generalisation over decision making states is desirable. Also, its Q-learning-oriented prediction update scheme enables it to handle multi-step problems adequately. This paper reports how the intertwined spirals problem, initially a popular benchmark in classification, was modified by the authors to verify XCS's suitability for behavioural design of robotic systems. When the results obtained were not as expected, investigations were continued until a rather surprising conclusion was drawn: XCS cannot handle very simple problems if the rewards are temporally-oriented, even if the reward is extremely short-delayed.