Optimal Ordered Problem Solver

Authors:
Schmidhuber J.
Affiliations:
-
Venue:
Optimal Ordered Problem Solver
Year:
2002

Citing 0
Cited 1

The Push3 execution stack and the evolution of control

GECCO '05 Proceedings of the 7th annual conference on Genetic and evolutionary computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a novel, general, optimally fast, incremental way of searching for a universal algorithm that solves each task in a sequence of tasks. The Optimal Ordered Problem Solver (OOPS) continually organizes and exploits previously found solutions to earlier tasks, efficiently searching not only the space of domain-specific algorithms, but also the space of search algorithms. Essentially we extend the principles of optimal nonincremental universal search to build an incremental universal learner that is able to improve itself through experience. The initial bias is embodied by a task-dependent probability distribution on possible program prefixes. Prefixes are self-delimiting and executed in online fashion while being generated. They compute the probabilities of their own possible continuations. Let p^n denote a found prefix solving the first n tasks. It may exploit previously stored solutions p^i, i n, by calling them as subprograms, or by copying them and editing the copies before applying them. We provide equal resources for two searches that run in parallel until p^{n+1} is discovered and stored. The first search is exhaustive; it systematically tests all possible prefixes on all tasks up to n+1. The second search is much more focused; it only searches for prefixes that start with p^n, and only tests them on task n+1, which is safe, because we already know that such prefixes solve all tasks up to n. Both searches are depth-first and bias-optimal: the branches of the search trees are program prefixes, and backtracking is triggered once the sum of the runtimes of the current prefix on all current tasks exceeds the prefix probability multiplied by the total search time so far. In illustrative experiments, our self-improver becomes the first general system that learns to solve all n disk Towers of Hanoi tasks (solution size 2^n-1) for n up to 30, profiting from previously solved, simpler tasks involving samples of a simple context free language.