Pooling in image representation: The visual codeword point of view

Authors:
Sandra Avila;Nicolas Thome;Matthieu Cord;Eduardo Valle;Arnaldo De A. AraúJo
Affiliations:
Université Pierre et Marie Curie, UPMC-Sorbonne Universities, LIP6, 4 place Jussieu, 75005 Paris, France and Federal University of Minas Gerais, NPDI Lab - DCC/UFMG, Belo Horizonte, MG, Brazi ...;Université Pierre et Marie Curie, UPMC-Sorbonne Universities, LIP6, 4 place Jussieu, 75005 Paris, France;Université Pierre et Marie Curie, UPMC-Sorbonne Universities, LIP6, 4 place Jussieu, 75005 Paris, France;State University of Campinas, RECOD Lab - DCA/FEEC/UNICAMP, Campinas, SP, Brazil;Federal University of Minas Gerais, NPDI Lab - DCC/UFMG, Belo Horizonte, MG, Brazil
Venue:
Computer Vision and Image Understanding
Year:
2013

Citing 23
Cited 0

NeTra: a toolbox for navigating large image databases

Multimedia Systems - Special issue on video content based retrieval
Exploiting generative models in discriminative classifiers

Proceedings of the 1998 conference on Advances in neural information processing systems II
Content-Based Image Retrieval at the End of the Early Years

IEEE Transactions on Pattern Analysis and Machine Intelligence
Video Google: A Text Retrieval Approach to Object Matching in Videos

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
A Comparison of Affine Region Detectors

International Journal of Computer Vision
Content-based multimedia information retrieval: State of the art and challenges

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Image retrieval: Ideas, influences, and trends of the new age

ACM Computing Surveys (CSUR)
Combining visual dictionary, kernel-based similarity and learning strategy for image category retrieval

Computer Vision and Image Understanding
Machine Learning Techniques for Multimedia: Case Studies on Organization and Retrieval (Cognitive Technologies)

Machine Learning Techniques for Multimedia: Case Studies on Organization and Retrieval (Cognitive Technologies)
Screening for Objectionable Images: A Review of Skin Detection Techniques

IMVIP '08 Proceedings of the 2008 International Machine Vision and Image Processing Conference
The MIR flickr retrieval evaluation

MIR '08 Proceedings of the 1st ACM international conference on Multimedia information retrieval
New trends and ideas in visual concept detection: the MIR flickr retrieval evaluation initiative

Proceedings of the international conference on Multimedia information retrieval
Online Learning for Matrix Factorization and Sparse Coding

The Journal of Machine Learning Research
Visual Word Ambiguity

IEEE Transactions on Pattern Analysis and Machine Intelligence
Evaluating Color Descriptors for Object and Scene Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Vlfeat: an open and portable library of computer vision algorithms

Proceedings of the international conference on Multimedia
Improving the fisher kernel for large-scale image classification

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part IV
Image classification using super-vector coding of local image descriptors

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part V
Efficient Additive Kernels via Explicit Feature Maps

IEEE Transactions on Pattern Analysis and Machine Intelligence
Ask the locals: Multi-way local pooling for image recognition

ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
Modeling spatial layout with fisher vectors for image categorization

ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
Unsupervised and supervised visual codes with restricted boltzmann machines

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part V

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this work, we propose BossaNova, a novel representation for content-based concept detection in images and videos, which enriches the Bag-of-Words model. Relying on the quantization of highly discriminant local descriptors by a codebook, and the aggregation of those quantized descriptors into a single pooled feature vector, the Bag-of-Words model has emerged as the most promising approach for concept detection on visual documents. BossaNova enhances that representation by keeping a histogram of distances between the descriptors found in the image and those in the codebook, preserving thus important information about the distribution of the local descriptors around each codeword. Contrarily to other approaches found in the literature, the non-parametric histogram representation is compact and simple to compute. BossaNova compares well with the state-of-the-art in several standard datasets: MIRFLICKR, ImageCLEF 2011, PASCAL VOC 2007 and 15-Scenes, even without using complex combinations of different local descriptors. It also complements well the cutting-edge Fisher Vector descriptors, showing even better results when employed in combination with them. BossaNova also shows good results in the challenging real-world application of pornography detection.