The dropout learning algorithm

Authors:
Pierre Baldi;Peter Sadowski
Affiliations:
-;-
Venue:
Artificial Intelligence
Year:
2014

Citing 13
Cited 0

Neural networks and principal component analysis: learning from examples without local minima

Neural Networks
Noise modulation of synaptic weights in a biological neural network

Neural Networks
A stochastic version of the delta rule

CNLS '89 Proceedings of the ninth annual international conference of the Center for Nonlinear Studies on Self-organizing, Collective, and Cooperative Phenomena in Natural and Artificial Computing Networks on Emergent computation
Training with noise is equivalent to Tikhonov regularization

Neural Computation
Bagging predictors

Machine Learning
On-line learning and stochastic approximations

On-line learning in neural networks
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Convex Optimization

Convex Optimization
Probability and Computing: Randomized Algorithms and Probabilistic Analysis

Probability and Computing: Randomized Algorithms and Probabilistic Analysis
The effects of adding noise during backpropagation training on a generalization performance

Neural Computation
Extracting and composing robust features with denoising autoencoders

Proceedings of the 25th international conference on Machine learning
Enhanced MLP performance and fault tolerance resulting from synaptic weight noise during training

IEEE Transactions on Neural Networks
Learning in linear neural networks: a survey

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

Dropout is a recently introduced algorithm for training neural networks by randomly dropping units during training to prevent their co-adaptation. A mathematical analysis of some of the static and dynamic properties of dropout is provided using Bernoulli gating variables, general enough to accommodate dropout on units or connections, and with variable rates. The framework allows a complete analysis of the ensemble averaging properties of dropout in linear networks, which is useful to understand the non-linear case. The ensemble averaging properties of dropout in non-linear logistic networks result from three fundamental equations: (1) the approximation of the expectations of logistic functions by normalized geometric means, for which bounds and estimates are derived; (2) the algebraic equality between normalized geometric means of logistic functions with the logistic of the means, which mathematically characterizes logistic functions; and (3) the linearity of the means with respect to sums, as well as products of independent variables. The results are also extended to other classes of transfer functions, including rectified linear functions. Approximation errors tend to cancel each other and do not accumulate. Dropout can also be connected to stochastic neurons and used to predict firing rates, and to backpropagation by viewing the backward propagation as ensemble averaging in a dropout linear network. Moreover, the convergence properties of dropout can be understood in terms of stochastic gradient descent. Finally, for the regularization properties of dropout, the expectation of the dropout gradient is the gradient of the corresponding approximation ensemble, regularized by an adaptive weight decay term with a propensity for self-consistent variance minimization and sparse representations.