Revisiting the VLAD image representation

Authors:
Jonathan Delhumeau;Philippe-Henri Gosselin;Hervé Jégou;Patrick Pérez
Affiliations:
INRIA, Rennes, France;INRIA, Rennes, France;INRIA, Rennes, France;Technicolor, Rennes, France
Venue:
Proceedings of the 21st ACM international conference on Multimedia
Year:
2013

Citing 10
Cited 0

Video Google: A Text Retrieval Approach to Object Matching in Videos

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Scale & Affine Invariant Interest Point Detectors

International Journal of Computer Vision
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Scalable Recognition with a Vocabulary Tree

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Improving Bag-of-Features for Large Scale Image Search

International Journal of Computer Vision
Improving the fisher kernel for large-scale image classification

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part IV
Product Quantization for Nearest Neighbor Search

IEEE Transactions on Pattern Analysis and Machine Intelligence
Three things everyone should know to improve object retrieval

CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Aggregating Local Image Descriptors into Compact Codes

IEEE Transactions on Pattern Analysis and Machine Intelligence
All About VLAD

CVPR '13 Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent works on image retrieval have proposed to index images by compact representations encoding powerful local descriptors, such as the closely related VLAD and Fisher vector. By combining such a representation with a suitable coding technique, it is possible to encode an image in a few dozen bytes while achieving excellent retrieval results. This paper revisits some assumptions proposed in this context regarding the handling of "visual burstiness", and shows that ad-hoc choices are implicitly done which are not desirable. Focusing on VLAD without loss of generality, we propose to modify several steps of the original design. Albeit simple, these modifications significantly improve VLAD and make it compare favorably against the state of the art.