Machine Learning - Special issue on inductive transfer
Learning to Learn Using Gradient Descent
ICANN '01 Proceedings of the International Conference on Artificial Neural Networks
Optimal Ordered Problem Solver
Machine Learning
Why evolution is not a good paradigm for program induction: a critique of genetic programming
Proceedings of the first ACM/SIGEVO Summit on Genetic and Evolutionary Computation
The Knowledge Engineering Review
Hi-index | 0.00 |
The goal of metalearning is to generate useful shifts of inductive bias by adapting the current learning strategy in a ``useful'''' way. Our learner leads a single life during which actions are continually executed according to the system''s internal state and current {\em policy} (a modifiable, probabilistic algorithm mapping environmental inputs and internal states to outputs and new internal states). An action is considered a learning algorithm if it can modify the policy. Effects of learning processes on later learning processes are measured using reward/time ratios. Occasional backtracking enforces success histories of still valid policy modifications corresponding to histories of lifelong reward accelerations. The principle allows for plugging in a wide variety of learning algorithms. In particular, it allows for embedding the learner''s policy modification strategy within the policy itself (self-reference). To demonstrate the principle''s feasibility in cases where conventional reinforcement learning fails, we test it in complex, non-Markovian, changing environments (``POMDPs''''). One of the tasks involves more than $10^{13}$ states, two learners that both cooperate and compete, and strongly delayed reinforcement signals (initially separated by more than 300,000 time steps).