A connection between score matching and denoising autoencoders

Authors:
Pascal Vincent
Affiliations:
-
Venue:
Neural Computation
Year:
2011

Citing 13
Cited 3

Learning continuous attractors in recurrent networks

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Training products of experts by minimizing contrastive divergence

Neural Computation
A New Learning Algorithm for Mean Field Boltzmann Machines

ICANN '02 Proceedings of the International Conference on Artificial Neural Networks
Estimation of Non-Normalized Statistical Models by Score Matching

The Journal of Machine Learning Research
A fast learning algorithm for deep belief nets

Neural Computation
Some extensions of score matching

Computational Statistics & Data Analysis
Extracting and composing robust features with denoising autoencoders

Proceedings of the 25th international conference on Machine learning
Contrastive divergence in gaussian diffusions

Neural Computation
Optimal approximation of signal priors

Neural Computation
Why Does Unsupervised Pre-training Help Deep Learning?

The Journal of Machine Learning Research
Interpretation and generalization of score matching

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion

The Journal of Machine Learning Research
Connections Between Score Matching, Contrastive Divergence, and Pseudolikelihood for Continuous-Valued Variables

IEEE Transactions on Neural Networks

On the expressive power of deep architectures

ALT'11 Proceedings of the 22nd international conference on Algorithmic learning theory
Tikhonov-Type regularization for restricted boltzmann machines

ICANN'12 Proceedings of the 22nd international conference on Artificial Neural Networks and Machine Learning - Volume Part I
Deep learning of representations: looking forward

SLSP'13 Proceedings of the First international conference on Statistical Language and Speech Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Denoising autoencoders have been previously shown to be competitive alternatives to restricted Boltzmann machines for unsupervised pretraining of each layer of a deep architecture. We show that a simple denoising autoencoder training criterion is equivalent to matching the score (with respect to the data) of a specific energy-based model to that of a nonparametric Parzen density estimator of the data. This yields several useful insights. It defines a proper probabilistic model for the denoising autoencoder technique, which makes it in principle possible to sample from them or rank examples by their energy. It suggests a different way to apply score matching that is related to learning to denoise and does not require computing second derivatives. It justifies the use of tied weights between the encoder and decoder and suggests ways to extend the success of denoising autoencoders to a larger family of energy-based models.