Learning continuous attractors in recurrent networks
NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Training products of experts by minimizing contrastive divergence
Neural Computation
A New Learning Algorithm for Mean Field Boltzmann Machines
ICANN '02 Proceedings of the International Conference on Artificial Neural Networks
Estimation of Non-Normalized Statistical Models by Score Matching
The Journal of Machine Learning Research
A fast learning algorithm for deep belief nets
Neural Computation
Some extensions of score matching
Computational Statistics & Data Analysis
Extracting and composing robust features with denoising autoencoders
Proceedings of the 25th international conference on Machine learning
Contrastive divergence in gaussian diffusions
Neural Computation
Optimal approximation of signal priors
Neural Computation
Why Does Unsupervised Pre-training Help Deep Learning?
The Journal of Machine Learning Research
Interpretation and generalization of score matching
UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
The Journal of Machine Learning Research
IEEE Transactions on Neural Networks
On the expressive power of deep architectures
ALT'11 Proceedings of the 22nd international conference on Algorithmic learning theory
Tikhonov-Type regularization for restricted boltzmann machines
ICANN'12 Proceedings of the 22nd international conference on Artificial Neural Networks and Machine Learning - Volume Part I
Deep learning of representations: looking forward
SLSP'13 Proceedings of the First international conference on Statistical Language and Speech Processing
Hi-index | 0.00 |
Denoising autoencoders have been previously shown to be competitive alternatives to restricted Boltzmann machines for unsupervised pretraining of each layer of a deep architecture. We show that a simple denoising autoencoder training criterion is equivalent to matching the score (with respect to the data) of a specific energy-based model to that of a nonparametric Parzen density estimator of the data. This yields several useful insights. It defines a proper probabilistic model for the denoising autoencoder technique, which makes it in principle possible to sample from them or rank examples by their energy. It suggests a different way to apply score matching that is related to learning to denoise and does not require computing second derivatives. It justifies the use of tied weights between the encoder and decoder and suggests ways to extend the success of denoising autoencoders to a larger family of energy-based models.