Improving the fisher kernel for large-scale image classification

Authors:
Florent Perronnin;Jorge Sánchez;Thomas Mensink
Affiliations:
Xerox Research Centre Europe;Xerox Research Centre Europe;Xerox Research Centre Europe
Venue:
ECCV'10 Proceedings of the 11th European conference on Computer vision: Part IV
Year:
2010

Citing 9
Cited 61

Exploiting generative models in discriminative classifiers

Proceedings of the 1998 conference on Advances in neural information processing systems II
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Training linear SVMs in linear time

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Scalable Recognition with a Vocabulary Tree

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study

International Journal of Computer Vision
Pegasos: Primal Estimated sub-GrAdient SOlver for SVM

Proceedings of the 24th international conference on Machine learning
80 Million Tiny Images: A Large Data Set for Nonparametric Object and Scene Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Visual Word Ambiguity

IEEE Transactions on Pattern Analysis and Machine Intelligence

Semantic combination of textual and visual information in multimedia retrieval

Proceedings of the 1st ACM International Conference on Multimedia Retrieval
Optimization of robust loss functions for weakly-labeled image taxonomies: an imagenet case study

EMMCVPR'11 Proceedings of the 8th international conference on Energy minimization methods in computer vision and pattern recognition
Semantic point detector

MM '11 Proceedings of the 19th ACM international conference on Multimedia
Bag-of-colors for improved image search

MM '11 Proceedings of the 19th ACM international conference on Multimedia
Images as sets of locally weighted features

Computer Vision and Image Understanding
SUPER: towards real-time event recognition in internet videos

Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
Hamming embedding similarity-based image classification

Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
Modeling the spatial layout of images beyond spatial pyramids

Pattern Recognition Letters
Efficient image annotation for automatic sentence generation

Proceedings of the 20th ACM international conference on Multimedia
Undoing the damage of dataset bias

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part I
Effective use of frequent itemset mining for image classification

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part I
Auto-grouped sparse representation for visual analysis

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part I
TriCoS: a tri-level class-discriminative co-segmentation method for image classification

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part I
Object-Centric spatial pooling for image classification

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part II
Metric learning for large scale image classification: generalizing to new classes at near-zero cost

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part II
Nested sparse quantization for efficient feature coding

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part II
Negative evidences and co-occurences in image retrieval: the benefit of PCA and whitening

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part II
Local higher-order statistics (LHS) for texture categorization and facial analysis

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part VII
Semantic segmentation with second-order pooling

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part VII
Local descriptors encoded by fisher vectors for person re-identification

ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part I
Beyond spatial pyramids: a new feature extraction framework with dense spatial sampling for image classification

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part IV
Learning compact visual attributes for large-scale image classification

ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part III
Data-driven vehicle identification by image matching

ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume 2
On the use of regions for semantic image segmentation

Proceedings of the Eighth Indian Conference on Computer Vision, Graphics and Image Processing
Sparse discriminative Fisher vectors in visual classification

Proceedings of the Eighth Indian Conference on Computer Vision, Graphics and Image Processing
Writer identification in handwritten musical scores with bags of notes

Pattern Recognition
Comparison of mid-level feature coding approaches and pooling strategies in visual concept detection

Computer Vision and Image Understanding
Pooling in image representation: The visual codeword point of view

Computer Vision and Image Understanding
Heterogeneous bag-of-features for object/scene recognition

Applied Soft Computing
Encoding local binary descriptors by bag-of-features with hamming distance for visual object categorization

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Efficient image signatures and similarities using tensor products of local descriptors

Computer Vision and Image Understanding
Fisher kernel based relevance feedback for multimodal video retrieval

Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
Multi-annulus partition based image representation for image classification

International Journal of Sensor Networks
Local hypersphere coding based on edges between visual words

ACCV'12 Proceedings of the 11th Asian conference on Computer Vision - Volume Part I
Learning hierarchical bag of words using naive bayes clustering

ACCV'12 Proceedings of the 11th Asian conference on Computer Vision - Volume Part I
Contextual pooling in image classification

ACCV'12 Proceedings of the 11th Asian conference on Computer Vision - Volume Part I
VISOR: towards on-the-fly large-scale object category retrieval

ACCV'12 Proceedings of the 11th Asian conference on Computer Vision - Volume Part II
q-Gaussian mixture models based on non-extensive statistics for image and video semantic indexing

ACCV'12 Proceedings of the 11th Asian conference on Computer Vision - Volume Part II
A comparative study of encoding, pooling and normalization methods for action recognition

ACCV'12 Proceedings of the 11th Asian conference on Computer Vision - Volume Part III
Beyond spatial pyramid matching: spatial soft voting for image classification

ACCV'12 Proceedings of the 11th international conference on Computer Vision - Volume 2
WaveLBP based hierarchical features for image classification

Pattern Recognition Letters
Textual Similarity with a Bag-of-Embedded-Words Model

Proceedings of the 2013 Conference on the Theory of Information Retrieval
Enabling low bitrate mobile visual recognition: a performance versus bandwidth evaluation

Proceedings of the 21st ACM international conference on Multimedia
Compact bag-of-words visual representation for effective linear classification

Proceedings of the 21st ACM international conference on Multimedia
Spatio-temporal fisher vector coding for surveillance event detection

Proceedings of the 21st ACM international conference on Multimedia
Revisiting the VLAD image representation

Proceedings of the 21st ACM international conference on Multimedia
Time matters!: capturing variation in time in video using fisher kernels

Proceedings of the 21st ACM international conference on Multimedia
Detecting profilable and overlapping communities with user-generated multimedia contents in LBSNs

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Flickr-tag prediction using multi-modal fusion and meta information

Proceedings of the 21st ACM international conference on Multimedia
Selective Search for Object Recognition

International Journal of Computer Vision
Weighted visual vocabulary to balance the descriptive ability on general dataset

Neurocomputing
Background subtraction using hybrid feature coding in the bag-of-features framework

Pattern Recognition Letters
Visual word spatial arrangement for image retrieval and classification

Pattern Recognition
q-Gaussian mixture models for image and video semantic indexing

Journal of Visual Communication and Image Representation
Multiple spatial pooling for visual object recognition

Neurocomputing
Object and Action Classification with Latent Window Parameters

International Journal of Computer Vision
Object Bank: An Object-Level Image Representation for High-Level Visual Recognition

International Journal of Computer Vision
Image Classification with the Fisher Vector: Theory and Practice

International Journal of Computer Vision
Robust human action recognition scheme based on high-level feature fusion

Multimedia Tools and Applications
Compact vectors of locally aggregated tensors for 3D shape retrieval

3DOR '13 Proceedings of the Sixth Eurographics Workshop on 3D Object Retrieval
SHREC'13 track: retrieval of objects captured with low-cost depth-sensing cameras

3DOR '13 Proceedings of the Sixth Eurographics Workshop on 3D Object Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Fisher kernel (FK) is a generic framework which combines the benefits of generative and discriminative approaches. In the context of image classification the FK was shown to extend the popular bag-of-visual-words (BOV) by going beyond count statistics. However, in practice, this enriched representation has not yet shown its superiority over the BOV. In the first part we show that with several well-motivated modifications over the original framework we can boost the accuracy of the FK. On PASCAL VOC 2007 we increase the Average Precision (AP) from 47.9% to 58.3%. Similarly, we demonstrate state-of-the-art accuracy on CalTech 256. A major advantage is that these results are obtained using only SIFT descriptors and costless linear classifiers. Equipped with this representation, we can now explore image classification on a larger scale. In the second part, as an application, we compare two abundant resources of labeled images to learn classifiers: ImageNet and Flickr groups. In an evaluation involving hundreds of thousands of training images we show that classifiers learned on Flickr groups perform surprisingly well (although they were not intended for this purpose) and that they can complement classifiers learned on more carefully annotated datasets.