Image Classification with the Fisher Vector: Theory and Practice

Authors:
Jorge Sánchez;Florent Perronnin;Thomas Mensink;Jakob Verbeek
Affiliations:
CIEM-CONICET, FaMAF, Universidad Nacional de Córdoba, Córdoba, Argentina X5000HUA;Xerox Research Centre Europe, Meylan, France 38240;Inteligent Systems Lab Amsterdam, University of Amsterdam, Amsterdam, The Netherlands 1098 XH;LEAR Team, INRIA Grenoble, Montbonnot, France 38330
Venue:
International Journal of Computer Vision
Year:
2013

Citing 31
Cited 0

Training with noise is equivalent to Tikhonov regularization

Neural Computation
Exploiting generative models in discriminative classifiers

Proceedings of the 1998 conference on Advances in neural information processing systems II
Recognition with Local Features: the Kernel Recipe

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Video Google: A Text Retrieval Approach to Object Matching in Videos

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Mercer Kernels for Object Recognition with Local Features

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
Object Categorization by Learned Universal Visual Dictionary

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study

International Journal of Computer Vision
Pegasos: Primal Estimated sub-GrAdient SOlver for SVM

Proceedings of the 24th international conference on Machine learning
The Pascal Visual Object Classes (VOC) Challenge

International Journal of Computer Vision
Visual Word Ambiguity

IEEE Transactions on Pattern Analysis and Machine Intelligence
Evaluating Color Descriptors for Object and Scene Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Improving the fisher kernel for large-scale image classification

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part IV
What does classifying more than 10,000 image categories tell us?

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part V
Image classification using super-vector coding of local image descriptors

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part V
Product Quantization for Nearest Neighbor Search

IEEE Transactions on Pattern Analysis and Machine Intelligence
Adapted vocabularies for generic visual categorization

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part IV
Unbiased look at dataset bias

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Discriminative affine sparse codes for image classification

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
High-dimensional signature compression for large-scale image classification

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Quantization

IEEE Transactions on Information Theory
A norm selection criterion for the generalized delta rule

IEEE Transactions on Neural Networks
Sparse kernel approximations for efficient classification and detection

CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Meta-class features for large-scale object categorization on a budget

CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Image categorization using Fisher kernels of non-iid image models

CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Ask the locals: Multi-way local pooling for image recognition

ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
Modeling spatial layout with fisher vectors for image categorization

ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
Aggregating Local Image Descriptors into Compact Codes

IEEE Transactions on Pattern Analysis and Machine Intelligence
Modeling the spatial layout of images beyond spatial pyramids

Pattern Recognition Letters
Metric learning for large scale image classification: generalizing to new classes at near-zero cost

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

A standard approach to describe an image for classification and retrieval purposes is to extract a set of local patch descriptors, encode them into a high dimensional vector and pool them into an image-level signature. The most common patch encoding strategy consists in quantizing the local descriptors into a finite set of prototypical elements. This leads to the popular Bag-of-Visual words representation. In this work, we propose to use the Fisher Kernel framework as an alternative patch encoding strategy: we describe patches by their deviation from an "universal" generative Gaussian mixture model. This representation, which we call Fisher vector has many advantages: it is efficient to compute, it leads to excellent results even with efficient linear classifiers, and it can be compressed with a minimal loss of accuracy using product quantization. We report experimental results on five standard datasets--PASCAL VOC 2007, Caltech 256, SUN 397, ILSVRC 2010 and ImageNet10K--with up to 9M images and 10K classes, showing that the FV framework is a state-of-the-art patch encoding technique.