Embedding a Priori Knowledge in Reinforcement Learning

Authors:
Carlos H. C. Ribeiro
Affiliations:
Dept. of Electrical and Electronic Engineering Imperial College of Science, Technology and Medicine Exhibition Road, London SW7 2BT, U.K./ e-mail: c.ribeiro@ic.ac.uk
Venue:
Journal of Intelligent and Robotic Systems
Year:
1998

Citing 7
Cited 5

Dynamic programming: deterministic and stochastic models

Dynamic programming: deterministic and stochastic models
Adaptive algorithms and stochastic approximations

Adaptive algorithms and stochastic approximations
Automatic programming of behavior-based robots using reinforcement learning

Artificial Intelligence
Practical Issues in Temporal Difference Learning

Machine Learning
Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching

Machine Learning
Feature-based methods for large scale dynamic programming

Machine Learning - Special issue on reinforcement learning
A two-dimensional interpolation function for irregularly-spaced data

ACM '68 Proceedings of the 1968 23rd ACM national conference

Experience generalization for concurrent reinforcement learners: the minimax-QS algorithm

Proceedings of the first international joint conference on Autonomous agents and multiagent systems: part 3
Reinforcement Learning Agents

Artificial Intelligence Review
A Design of Reward Function Based on Knowledge in Multi-agent Learning

ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
A Human-Robot Collaborative Reinforcement Learning Algorithm

Journal of Intelligent and Robotic Systems
Knowledge of opposite actions for reinforcement learning

Applied Soft Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the last years, temporal differences methods have been put forward asconvenient tools for reinforcement learning. Techniques based on temporaldifferences, however, suffer from a serious drawback: as stochastic adaptivealgorithms, they may need extensive exploration of the state-action spacebefore convergence is achieved. Although the basic methods are nowreasonably well understood, it is precisely the structural simplicity of thereinforcement learning principle – learning through experimentation– that causes these excessive demands on the learning agent.Additionally, one must consider that the agent is very rarely a tabula rasa:some rough knowledge about characteristics of the surrounding environment isoften available. In this paper, I present methods for embedding a prioriknowledge in a reinforcement learning technique in such a way that both themathematical structure of the basic learning algorithm and the capacity togeneralise experience across the state-action space are kept. Extensiveexperimental results show that the resulting variants may lead to goodperformance, provided a sensible balance between risky use of priorimprecise knowledge and cautious use of learning experience is adopted.