Transforming auto-encoders

Authors:
Geoffrey E. Hinton;Alex Krizhevsky;Sida D. Wang
Affiliations:
Department of Computer Science, University of Toronto;Department of Computer Science, University of Toronto;Department of Computer Science, University of Toronto
Venue:
ICANN'11 Proceedings of the 21th international conference on Artificial neural networks - Volume Part I
Year:
2011

Citing 6
Cited 2

TRAFFIC: recognizing objects using hierarchical reference frame transformations

Advances in neural information processing systems 2
The Design and Use of Steerable Filters

IEEE Transactions on Pattern Analysis and Machine Intelligence
Object Recognition from Local Scale-Invariant Features

ICCV '99 Proceedings of the International Conference on Computer Vision-Volume 2 - Volume 2
Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Shape representation in parallel systems

IJCAI'81 Proceedings of the 7th international joint conference on Artificial intelligence - Volume 2
Learning to represent spatial transformations with factored higher-order boltzmann machines

Neural Computation

Augmented attribute representations

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part V
Deep learning of representations: looking forward

SLSP'13 Proceedings of the First international conference on Statistical Language and Speech Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The artificial neural networks that are used to recognize shapes typically use one or more layers of learned feature detectors that produce scalar outputs. By contrast, the computer vision community uses complicated, hand-engineered features, like SIFT [6], that produce a whole vector of outputs including an explicit representation of the pose of the feature. We show how neural networks can be used to learn features that output a whole vector of instantiation parameters and we argue that this is a much more promising way of dealing with variations in position, orientation, scale and lighting than the methods currently employed in the neural networks community. It is also more promising than the hand-engineered features currently used in computer vision because it provides an efficient way of adapting the features to the domain.