Efficient ant reinforcement learning using replacing eligibility traces

Authors:
SeungGwan Lee;SeokMi Hong
Affiliations:
School of Computer Science and Information Engineering, Catholic University, Bucheon-Si, Gyeonggi-Do, Korea;School of Computer Information and Communication Engineering, Sangji University, KangWon-Do, Korea
Venue:
ICAISC'06 Proceedings of the 8th international conference on Artificial Intelligence and Soft Computing
Year:
2006

Citing 6
Cited 0

Efficient reinforcement learning

COLT '94 Proceedings of the seventh annual conference on Computational learning theory
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
A Study of Some Properties of Ant-Q

PPSN IV Proceedings of the 4th International Conference on Parallel Problem Solving from Nature
Multiagent reinforcement learning algorithm using temporal difference error

ISNN'05 Proceedings of the Second international conference on Advances in Neural Networks - Volume Part I
Ant colony system: a cooperative learning approach to the traveling salesman problem

IEEE Transactions on Evolutionary Computation
Ant system: optimization by a colony of cooperating agents

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Quantified Score

Hi-index	0.00

Visualization

Abstract

The eligibility trace is one of the basic mechanisms in reinforcement learning to handle delayed reward. The traces are said to indicate the degree to which each state is eligible for undergoing learning changes should a reinforcing event occur. Formally, there are two kinds of eligibility traces(accumulating trace or replacing traces). In this paper, we propose an ant reinforcement learning algorithms using an eligibility traces which is called replace-trace methods(Ant-TD(λ)). This method is a hybrid of Ant-Q and eligibility traces. With replacing traces, the eligibility trace for the maximum(MaxAQ(s,z)) state visited on the step is reset to 1 and the eligibility traces for another states decay by γλ. Although replacing traces are only slightly different from accumulating traces, it can produce a significant improvement in optimization. We could know through an experiment that proposed reinforcement learning method converges faster to optimal solution than ACS and Ant-Q.