Exploring Strategies for Training Deep Neural Networks

Authors:
Hugo Larochelle;Yoshua Bengio;Jérôme Louradour;Pascal Lamblin
Affiliations:
-;-;-;-
Venue:
The Journal of Machine Learning Research
Year:
2009

Citing 0
Cited 15

Learning Deep Architectures for AI

Foundations and Trends® in Machine Learning
Why Does Unsupervised Pre-training Help Deep Learning?

The Journal of Machine Learning Research
Unsupervised Layer-Wise Model Selection in Deep Neural Networks

Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Deep bottleneck classifiers in supervised dimension reduction

ICANN'10 Proceedings of the 20th international conference on Artificial neural networks: Part III
Accelerating large-scale convolutional neural networks with parallel graphics multiprocessors

ICANN'10 Proceedings of the 20th international conference on Artificial neural networks: Part III
Learning to detect roads in high-resolution aerial images

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part VI
Deep adaptive networks for image classification

ICIMCS '10 Proceedings of the Second International Conference on Internet Multimedia Computing and Service
Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion

The Journal of Machine Learning Research
Kernel Analysis of Deep Networks

The Journal of Machine Learning Research
Supervised learning with minimal effort

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
A skin detection approach based on the Dempster--Shafer theory of evidence

International Journal of Approximate Reasoning
Deep learning networks for off-line handwritten signature recognition

CIARP'11 Proceedings of the 16th Iberoamerican Congress conference on Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications
Online learning and generalization of parts-based image representations by non-negative sparse autoencoders

Neural Networks
Letters: Handwritten digit recognition using biologically inspired features

Neurocomputing
On MAP and MMSE estimators for the co-sparse analysis model

Digital Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Deep multi-layer neural networks have many levels of non-linearities allowing them to compactly represent highly non-linear and highly-varying functions. However, until recently it was not clear how to train such deep networks, since gradient-based optimization starting from random initialization often appears to get stuck in poor solutions. Hinton et al. recently proposed a greedy layer-wise unsupervised learning procedure relying on the training algorithm of restricted Boltzmann machines (RBM) to initialize the parameters of a deep belief network (DBN), a generative model with many layers of hidden causal variables. This was followed by the proposal of another greedy layer-wise procedure, relying on the usage of autoassociator networks. In the context of the above optimization problem, we study these algorithms empirically to better understand their success. Our experiments confirm the hypothesis that the greedy layer-wise unsupervised training strategy helps the optimization by initializing weights in a region near a good local minimum, but also implicitly acts as a sort of regularization that brings better generalization and encourages internal distributed representations that are high-level abstractions of the input. We also present a series of experiments aimed at evaluating the link between the performance of deep neural networks and practical aspects of their topology, for example, demonstrating cases where the addition of more depth helps. Finally, we empirically explore simple variants of these training algorithms, such as the use of different RBM input unit distributions, a simple way of combining gradient estimators to improve performance, as well as on-line versions of those algorithms.