Embedding a Priori Knowledge in Reinforcement Learning

  • Authors:
  • Carlos H. C. Ribeiro

  • Affiliations:
  • Dept. of Electrical and Electronic Engineering Imperial College of Science, Technology and Medicine Exhibition Road, London SW7 2BT, U.K./ e-mail: c.ribeiro@ic.ac.uk

  • Venue:
  • Journal of Intelligent and Robotic Systems
  • Year:
  • 1998

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the last years, temporal differences methods have been put forward asconvenient tools for reinforcement learning. Techniques based on temporaldifferences, however, suffer from a serious drawback: as stochastic adaptivealgorithms, they may need extensive exploration of the state-action spacebefore convergence is achieved. Although the basic methods are nowreasonably well understood, it is precisely the structural simplicity of thereinforcement learning principle – learning through experimentation– that causes these excessive demands on the learning agent.Additionally, one must consider that the agent is very rarely a tabula rasa:some rough knowledge about characteristics of the surrounding environment isoften available. In this paper, I present methods for embedding a prioriknowledge in a reinforcement learning technique in such a way that both themathematical structure of the basic learning algorithm and the capacity togeneralise experience across the state-action space are kept. Extensiveexperimental results show that the resulting variants may lead to goodperformance, provided a sensible balance between risky use of priorimprecise knowledge and cautious use of learning experience is adopted.