Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Learning to Predict by the Methods of Temporal Differences
Machine Learning
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Bias-Variance Error Bounds for Temporal Difference Updates
COLT '00 Proceedings of the Thirteenth Annual Conference on Computational Learning Theory
On the Existence of Fixed Points for Q-Learning and Sarsa in Partially Observable Domains
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
On the convergence of stochastic iterative dynamic programming algorithms
Neural Computation
Reinforcement learning: a survey
Journal of Artificial Intelligence Research
Hi-index | 0.00 |
Reinforcement learning algorithms that use eligibility traces, such as Sarsa(λ), have been empirically shown to be effective in learning good estimated-state-based policies in partially observable Markov decision processes (POMDPs). Nevertheless, one can construct counterexamples, problems in which Sarsa(λ