Completely self-referential optimal reinforcement learners

Authors:
Jürgen Schmidhuber
Affiliations:
IDSIA, Manno, Switzerland and TU Munich, München, Germany
Venue:
ICANN'05 Proceedings of the 15th international conference on Artificial neural networks: formal models and their applications - Volume Part II
Year:
2005

Citing 7
Cited 5

Randomness conservation inequalities; information and independence in mathematical theories

Information and Control
First-order logic and automated theorem proving (2nd ed.)

First-order logic and automated theorem proving (2nd ed.)
On Effective Procedures for Speeding Up Algorithms

Journal of the ACM (JACM)
The Speed Prior: A New Simplicity Measure Yielding Near-Optimal Computable Predictions

COLT '02 Proceedings of the 15th Annual Conference on Computational Learning Theory
Optimal Ordered Problem Solver

Machine Learning
Universal Artificial Intelligence: Sequential Decisions Based On Algorithmic Probability

Universal Artificial Intelligence: Sequential Decisions Based On Algorithmic Probability
Reinforcement learning: a survey

Journal of Artificial Intelligence Research

Driven by Compression Progress: A Simple Principle Explains Essential Aspects of Subjective Beauty, Novelty, Surprise, Interestingness, Attention, Curiosity, Creativity, Art, Science, Music, Jokes

Anticipatory Behavior in Adaptive Learning Systems
Guided self-organisation for autonomous robot development

ECAL'07 Proceedings of the 9th European conference on Advances in artificial life
Simple algorithmic principles of discovery, subjective beauty, selective attention, curiosity & creativity

DS'07 Proceedings of the 10th international conference on Discovery science
A family of Gödel machine implementations

AGI'11 Proceedings of the 4th international conference on Artificial general intelligence
Real-world limits to algorithmic intelligence

AGI'11 Proceedings of the 4th international conference on Artificial general intelligence

Quantified Score

Hi-index	0.01

Visualization

Abstract

We present the first class of mathematically rigorous, general, fully self-referential, self-improving, optimal reinforcement learning systems. Such a system rewrites any part of its own code as soon as it has found a proof that the rewrite is useful, where the problemdependent utility function and the hardware and the entire initial code are described by axioms encoded in an initial proof searcher which is also part of the initial code. The searcher systematically and efficiently tests computable proof techniques (programs whose outputs are proofs) until it finds a provably useful, computable self-rewrite. We show that such a self-rewrite is globally optimal--no local maxima!--since the code first had to prove that it is not useful to continue the proof search for alternative self-rewrites. Unlike previous non-self-referential methods based on hardwired proof searchers, ours not only boasts an optimal order of complexity but can optimally reduce any slowdowns hidden by the O()- notation, provided the utility of such speed-ups is provable at all.