Robustness Analysis of SARSA(λ): Different Models of Reward and Initialisation

Authors:
Marek Grześ;Daniel Kudenko
Affiliations:
Department of Computer Science, University of York, York, UK YO10 5DD;Department of Computer Science, University of York, York, UK YO10 5DD
Venue:
AIMSA '08 Proceedings of the 13th international conference on Artificial Intelligence: Methodology, Systems, and Applications
Year:
2008

Citing 12
Cited 1

Integrated architecture for learning, planning, and reacting based on approximating dynamic programming

Proceedings of the seventh international conference (1990) on Machine learning
Practical Issues in Temporal Difference Learning

Machine Learning
Reinforcement learning with replacing eligibility traces

Machine Learning - Special issue on reinforcement learning
Incremental multi-step Q-learning

Machine Learning - Special issue on reinforcement learning
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Artificial Intelligence: A Modern Approach

Artificial Intelligence: A Modern Approach
Temporal credit assignment in reinforcement learning

Temporal credit assignment in reinforcement learning
Truncating temporal differences: on the efficient implementation of TD (λ) for reinforcement learning

Journal of Artificial Intelligence Research
Model-based exploration in continuous state spaces

SARA'07 Proceedings of the 7th International conference on Abstraction, reformulation, and approximation

Reinforcement Learning with Reward Shaping and Mixed Resolution Function Approximation

International Journal of Agent Technologies and Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the paper the robustness of SARSA(λ), thereinforcement learning algorithm with eligibility traces, isconfronted with different models of reward and initialisation ofthe Q-table. Most of the empirical analyses of eligibility tracesin the literature have focused mainly on the step-penalty reward.We analyse two general types of rewards (final goal andstep-penalty rewards) and show that learning with long traces,i.e., with high values of λ, can lead to suboptimalsolutions in some situations. Problems are identified anddiscussed. Specifically, obtained results show thatSARSA(λ) is sensitive to different models of rewardand initialisation. In some cases the asymptotic performance can besignificantly reduced.