SarsaLandmark: an algorithm for learning in POMDPs with landmarks

Authors:
Michael R. James;Satinder Singh
Affiliations:
Toyota Research Institute NA, Ann Arbor, MI;University of Michigan, Ann Arbor, MI
Venue:
Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Year:
2009

Citing 7
Cited 0

Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Bias-Variance Error Bounds for Temporal Difference Updates

COLT '00 Proceedings of the Thirteenth Annual Conference on Computational Learning Theory
On the Existence of Fixed Points for Q-Learning and Sarsa in Partially Observable Domains

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
On the convergence of stochastic iterative dynamic programming algorithms

Neural Computation
Reinforcement learning: a survey

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Reinforcement learning algorithms that use eligibility traces, such as Sarsa(λ), have been empirically shown to be effective in learning good estimated-state-based policies in partially observable Markov decision processes (POMDPs). Nevertheless, one can construct counterexamples, problems in which Sarsa(λ