Ask the locals: Multi-way local pooling for image recognition

Authors:
Y-Lan Boureau;Nicolas Le Roux;Francis Bach;Jean Ponce;Yann LeCun
Affiliations:
INRIA, France;INRIA, France;INRIA, France;Ecole Normale Supérieure, France;Courant Institute, New York University, USA
Venue:
ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
Year:
2011

Citing 0
Cited 20

Submodular video hashing: a unified framework towards video pooling and indexing

Proceedings of the 20th ACM international conference on Multimedia
A dictionary learning approach for classification: separating the particularity and the commonality

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part I
Scene aligned pooling for complex video recognition

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part II
Fast approximations to structured sparse coding and applications to object classification

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part V
Unsupervised and supervised visual codes with restricted boltzmann machines

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part V
Semantic segmentation with second-order pooling

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part VII
Learning invariant feature hierarchies

ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part I
Beyond spatial pyramids: a new feature extraction framework with dense spatial sampling for image classification

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part IV
Hybrid pooling fusion in the bow pipeline

ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part III
Comparison of mid-level feature coding approaches and pooling strategies in visual concept detection

Computer Vision and Image Understanding
Pooling in image representation: The visual codeword point of view

Computer Vision and Image Understanding
The pooled NBNN kernel: beyond image-to-class and image-to-image

ACCV'12 Proceedings of the 11th Asian conference on Computer Vision - Volume Part I
Spatially local coding for object recognition

ACCV'12 Proceedings of the 11th Asian conference on Computer Vision - Volume Part I
Contextual pooling in image classification

ACCV'12 Proceedings of the 11th Asian conference on Computer Vision - Volume Part I
Beyond spatial pyramid matching: spatial soft voting for image classification

ACCV'12 Proceedings of the 11th international conference on Computer Vision - Volume 2
Segmental multi-way local pooling for video recognition

Proceedings of the 21st ACM international conference on Multimedia
A classification-oriented dictionary learning model: Explicitly learning the particularity and commonality across categories

Pattern Recognition
Learning group-based dictionaries for discriminative image representation

Pattern Recognition
Multiple spatial pooling for visual object recognition

Neurocomputing
Image Classification with the Fisher Vector: Theory and Practice

International Journal of Computer Vision

Quantified Score

Hi-index	0.00

Visualization

Abstract

Invariant representations in object recognition systems are generally obtained by pooling feature vectors over spatially local neighborhoods. But pooling is not local in the feature vector space, so that widely dissimilar features may be pooled together if they are in nearby locations. Recent approaches rely on sophisticated encoding methods and more specialized codebooks (or dictionaries), e.g., learned on subsets of descriptors which are close in feature space, to circumvent this problem. In this work, we argue that a common trait found in much recent work in image recognition or retrieval is that it leverages locality in feature space on top of purely spatial locality. We propose to apply this idea in its simplest form to an object recognition system based on the spatial pyramid framework, to increase the performance of small dictionaries with very little added engineering. State-of-the-art results on several object recognition benchmarks show the promise of this approach.