Robustness Analysis of SARSA(λ): Different Models of Reward and Initialisation

  • Authors:
  • Marek Grześ;Daniel Kudenko

  • Affiliations:
  • Department of Computer Science, University of York, York, UK YO10 5DD;Department of Computer Science, University of York, York, UK YO10 5DD

  • Venue:
  • AIMSA '08 Proceedings of the 13th international conference on Artificial Intelligence: Methodology, Systems, and Applications
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the paper the robustness of SARSA(λ), thereinforcement learning algorithm with eligibility traces, isconfronted with different models of reward and initialisation ofthe Q-table. Most of the empirical analyses of eligibility tracesin the literature have focused mainly on the step-penalty reward.We analyse two general types of rewards (final goal andstep-penalty rewards) and show that learning with long traces,i.e., with high values of λ, can lead to suboptimalsolutions in some situations. Problems are identified anddiscussed. Specifically, obtained results show thatSARSA(λ) is sensitive to different models of rewardand initialisation. In some cases the asymptotic performance can besignificantly reduced.