Reinforcement Learning in Situated Agents: Theoretical and Practical Solutions

Authors:
Mark D. Pendrith
Affiliations:
-
Venue:
EWLR-8 Proceedings of the 8th European Workshop on Learning Robots: Advances in Robot Learning
Year:
1999

Citing 5
Cited 1

Linear least-squares algorithms for temporal difference learning

Machine Learning - Special issue on reinforcement learning
Reinforcement learning with replacing eligibility traces

Machine Learning - Special issue on reinforcement learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Intelligence without reason

IJCAI'91 Proceedings of the 12th international joint conference on Artificial intelligence - Volume 1

Reinforcement learning in robotics: A survey

International Journal of Robotics Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

In on-line reinforcement learning, often a large number of estimation parameters (e.g. Q-value estimates for 1-step Q-learning) are maintained and dynamically updated as information comes to hand during the learning process. Excessive variance of these estimators can be problematic, resulting in uneven or unstable learning, or even making effective learning impossible. Estimator variance is usually managed only indirectly, by selecting global learning algorithm parameters (e.g. λ for TD(λ) based methods) that are a compromise between an acceptable level of estimator perturbation and other desirable system attributes, such as reduced estimator bias. In this paper, we argue that this approach may not always be adequate, particularly for noisy and non-Markovian domains, and present a direct approach to managing estimator variance, the ccBeta algorithm. Empirical results in an autonomous robotics domain are also presented showing improved performance using the new ccBeta method.