Evaluation of pooling operations in convolutional architectures for object recognition

Authors:
Dominik Scherer;Andreas Müller;Sven Behnke
Affiliations:
University of Bonn, Institute of Computer Science VI, Bonn, Germany;University of Bonn, Institute of Computer Science VI, Bonn, Germany;University of Bonn, Institute of Computer Science VI, Bonn, Germany
Venue:
ICANN'10 Proceedings of the 20th international conference on Artificial neural networks: Part III
Year:
2010

Citing 15
Cited 6

A neural network model for selective attention in visual pattern recognition

Biological Cybernetics
Effiicient BackProp

Neural Networks: Tricks of the Trade, this book is an outgrowth of a 1996 NIPS workshop
Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Histograms of Oriented Gradients for Human Detection

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Object Recognition with Features Inspired by Visual Cortex

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
Multiclass Object Recognition with Sparse, Localized Features

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 1
Large-scale Learning with SVM and Convolutional for Generic Object Categorization

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 1
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Rapid Biologically-Inspired Scene Classification Using Features Shared with Visual Attention

IEEE Transactions on Pattern Analysis and Machine Intelligence
Hierarchical Neural Networks for Image Interpretation (Lecture Notes in Computer Science)

Hierarchical Neural Networks for Image Interpretation (Lecture Notes in Computer Science)
Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories

Computer Vision and Image Understanding
Synergistic Face Detection and Pose Estimation with Energy-Based Models

The Journal of Machine Learning Research
Training Hierarchical Feed-Forward Visual Recognition Models Using Transfer Learning from Pseudo-Tasks

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part III
Learning methods for generic object recognition with invariance to pose and lighting

CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition

Stacked convolutional auto-encoders for hierarchical feature extraction

ICANN'11 Proceedings of the 21th international conference on Artificial neural networks - Volume Part I
On fast deep nets for AGI vision

AGI'11 Proceedings of the 4th international conference on Artificial general intelligence
2012 Special Issue: Multi-column deep neural network for traffic sign classification

Neural Networks
Flexible, high performance convolutional neural networks for image classification

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Effects of architecture choices on sparse coding in speech recognition

ICANN'12 Proceedings of the 22nd international conference on Artificial Neural Networks and Machine Learning - Volume Part I
Multiscale convolutional neural networks for vision: based classification of cells

ACCV'12 Proceedings of the 11th Asian conference on Computer Vision - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

A common practice to gain invariant features in object recognition models is to aggregate multiple low-level features over a small neighborhood. However, the differences between those models makes a comparison of the properties of different aggregation functions hard. Our aim is to gain insight into different functions by directly comparing them on a fixed architecture for several common object recognition tasks. Empirical results show that a maximum pooling operation significantly outperforms subsampling operations. Despite their shift-invariant properties, overlapping pooling windows are no significant improvement over nonoverlapping pooling windows. By applying this knowledge, we achieve state-of-the-art error rates of 4.57% on the NORB normalized-uniform dataset and 5.6% on the NORB jittered-cluttered dataset.