Learning to Learn Using Gradient Descent

Authors:
Sepp Hochreiter;A. Steven Younger;Peter R. Conwell
Affiliations:
-;-;-
Venue:
ICANN '01 Proceedings of the International Conference on Artificial Neural Networks
Year:
2001

Citing 8
Cited 3

Quantifying inductive bias: AI learning algorithms and Valiant's learning framework

Artificial Intelligence
Gradient-based learning algorithms for recurrent networks and their computational complexity

Backpropagation
Flat minima

Neural Computation
Shifting Inductive Bias with Success-Story Algorithm, AdaptiveLevin Search, and Incremental Self-Improvement

Machine Learning - Special issue on inductive transfer
Learning to learn

Learning to learn
Simple Principles of Metalearning

Simple Principles of Metalearning
Long Short-Term Memory

Neural Computation
Fixed-weight on-line learning

IEEE Transactions on Neural Networks

Group Nearest Neighbor Queries

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Learning at the Speed of Light: A New Type of Optical Neural Network

OSC '08 Proceedings of the 1st international workshop on Optical SuperComputing
Time window width influence on dynamic BPTT(h) learning algorithm performances: experimental study

ICANN'06 Proceedings of the 16th international conference on Artificial Neural Networks - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper introduces the application of gradient descent methods to meta-learning. The concept of "meta-learning", i.e. of a system that improves or discovers a learning algorithm, has been of interest in machine learning for decades because of its appealing applications. Previous meta-learning approaches have been based on evolutionary methods and, therefore, have been restricted to small models with few free parameters. We make meta-learning in large systems feasible by using recurrent neural networks withth eir attendant learning routines as meta-learning systems. Our system derived complex well performing learning algorithms from scratch. In this paper we also show that our approachp erforms non-stationary time series prediction.