Hybrid pooling fusion in the bow pipeline

Authors:
Marc Law;Nicolas Thome;Matthieu Cord
Affiliations:
LIP6, UPMC - Sorbonne University, Paris, France;LIP6, UPMC - Sorbonne University, Paris, France;LIP6, UPMC - Sorbonne University, Paris, France
Venue:
ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part III
Year:
2012

Citing 9
Cited 0

Video Google: A Text Retrieval Approach to Object Matching in Videos

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories

CVPRW '04 Proceedings of the 2004 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'04) Volume 12 - Volume 12
Learning rich semantics from news video archives by style analysis

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
LIBLINEAR: A Library for Large Linear Classification

The Journal of Machine Learning Research
Visual Word Ambiguity

IEEE Transactions on Pattern Analysis and Machine Intelligence
Efficient Additive Kernels via Explicit Feature Maps

IEEE Transactions on Pattern Analysis and Machine Intelligence
Ask the locals: Multi-way local pooling for image recognition

ICCV '11 Proceedings of the 2011 International Conference on Computer Vision

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the context of object and scene recognition, state-of-the-art performances are obtained with Bag of Words (BoW) models of mid-level representations computed from dense sampled local descriptors (e.g. SIFT). Several methods to combine low-level features and to set mid-level parameters have been evaluated recently for image classification. In this paper, we further investigate the impact of the main parameters in the BoW pipeline. We show that an adequate combination of several low (sampling rate, multiscale) and mid level (codebook size, normalization) parameters is decisive to reach good performances. Based on this analysis, we propose a merging scheme exploiting the specificities of edge-based descriptors. Low and high-contrast regions are pooled separately and combined to provide a powerful representation of images. Sucessful experiments are provided on the Caltech-101 and Scene-15 datasets.