Modeling the spatial layout of images beyond spatial pyramids

Authors:
Jorge SáNchez;Florent Perronnin;TeóFilo De Campos
Affiliations:
CIEM-CONICET, FaMAF, Universidad Nacional de Córdoba, X5000HUA Córdoba, Argentina;Xerox Research Centre Europe, 6 Chemin de Maupertuis, 38240 Meylan, France;CVSSP, University of Surrey, Guildford GU2 7XH, UK
Venue:
Pattern Recognition Letters
Year:
2012

Citing 14
Cited 2

Video Google: A Text Retrieval Approach to Object Matching in Videos

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Kernel Codebooks for Scene Categorization

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part III
Spatial extensions to bag of visual words

Proceedings of the ACM International Conference on Image and Video Retrieval
Vlfeat: an open and portable library of computer vision algorithms

Proceedings of the international conference on Multimedia
Improving the fisher kernel for large-scale image classification

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part IV
Efficient highly over-complete sparse coding using a mixture model

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part V
Image classification using super-vector coding of local image descriptors

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part V
Images as sets of locally weighted features

Computer Vision and Image Understanding
Global contrast based salient region detection

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Cats and dogs

CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Modeling spatial layout with fisher vectors for image categorization

ICCV '11 Proceedings of the 2011 International Conference on Computer Vision

Images as sets of locally weighted features

Computer Vision and Image Understanding
Image Classification with the Fisher Vector: Theory and Practice

International Journal of Computer Vision

Quantified Score

Hi-index	0.10

Visualization

Abstract

Several state-of-the-art image representations consist in averaging local statistics computed from patch-level descriptors. It has been shown by Boureau et al. that such average statistics suffer from two sources of variance. The first one comes from the fact that a finite set of local statistics are averaged. The second one is due to the variation in the proportion of object-dependent information between different images of the same class. For the problem of object classification, these sources of variance affect negatively the accuracy since they increase the overlap between class-conditional probabilities. Our goal is to include information about the spatial layout of images in image signatures based on average statistics. We show that the traditional approach to including the spatial layout - the spatial pyramid (SP) - increases the first source of variance while only weakly reducing the second one. We therefore propose two complementary approaches to account for the spatial layout which are compatible with our goal of variance reduction. The first one models the spatial layout in an image-independent manner (as is the case of the SP) while the second one adapts to the image content. A significant benefit of these approaches with respect to the SP is that they do not incur an increase of the image signature dimensionality. We show on PASCAL VOC 2007, 2008 and 2009 the benefits of our approach.