Simple Principles of Metalearning

  • Authors:
  • Juergen Schmidhuber;Jieyu Zhao;Marco Wiering

  • Affiliations:
  • -;-;-

  • Venue:
  • Simple Principles of Metalearning
  • Year:
  • 1996

Quantified Score

Hi-index 0.00

Visualization

Abstract

The goal of metalearning is to generate useful shifts of inductive bias by adapting the current learning strategy in a ``useful'''' way. Our learner leads a single life during which actions are continually executed according to the system''s internal state and current {\em policy} (a modifiable, probabilistic algorithm mapping environmental inputs and internal states to outputs and new internal states). An action is considered a learning algorithm if it can modify the policy. Effects of learning processes on later learning processes are measured using reward/time ratios. Occasional backtracking enforces success histories of still valid policy modifications corresponding to histories of lifelong reward accelerations. The principle allows for plugging in a wide variety of learning algorithms. In particular, it allows for embedding the learner''s policy modification strategy within the policy itself (self-reference). To demonstrate the principle''s feasibility in cases where conventional reinforcement learning fails, we test it in complex, non-Markovian, changing environments (``POMDPs''''). One of the tasks involves more than $10^{13}$ states, two learners that both cooperate and compete, and strongly delayed reinforcement signals (initially separated by more than 300,000 time steps).