Comparison of mid-level feature coding approaches and pooling strategies in visual concept detection

Authors:
Piotr Koniusz;Fei Yan;Krystian Mikolajczyk
Affiliations:
Centre for Vision, Speech and Signal Processing, University of Surrey, GU2 7XH Guildford, UK;Centre for Vision, Speech and Signal Processing, University of Surrey, GU2 7XH Guildford, UK;Centre for Vision, Speech and Signal Processing, University of Surrey, GU2 7XH Guildford, UK
Venue:
Computer Vision and Image Understanding
Year:
2013

Citing 22
Cited 0

Support-Vector Networks

Machine Learning
Video Google: A Text Retrieval Approach to Object Matching in Videos

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories

CVPRW '04 Proceedings of the 2004 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'04) Volume 12 - Volume 12
A Performance Evaluation of Local Descriptors

IEEE Transactions on Pattern Analysis and Machine Intelligence
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Evaluating bag-of-visual-words representations in scene classification

Proceedings of the international workshop on Workshop on multimedia information retrieval
A comparison of color features for visual concept classification

CIVR '08 Proceedings of the 2008 international conference on Content-based image and video retrieval
The MIR flickr retrieval evaluation

MIR '08 Proceedings of the 1st ACM international conference on Multimedia information retrieval
Kernel Codebooks for Scene Categorization

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part III
Automated Flower Classification over a Large Number of Classes

ICVGIP '08 Proceedings of the 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing
New trends and ideas in visual concept detection: the MIR flickr retrieval evaluation initiative

Proceedings of the international conference on Multimedia information retrieval
Online Learning for Matrix Factorization and Sparse Coding

The Journal of Machine Learning Research
Visual Word Ambiguity

IEEE Transactions on Pattern Analysis and Machine Intelligence
Improving the fisher kernel for large-scale image classification

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part IV
Efficient highly over-complete sparse coding using a mixture model

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part V
Image classification using super-vector coding of local image descriptors

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part V
Discriminative affine sparse codes for image classification

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Matching pursuits with time-frequency dictionaries

IEEE Transactions on Signal Processing
Greed is good: algorithmic results for sparse approximation

IEEE Transactions on Information Theory
In defense of soft-assignment coding

ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
Ask the locals: Multi-way local pooling for image recognition

ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
A graph-matching kernel for object categorization

ICCV '11 Proceedings of the 2011 International Conference on Computer Vision

Quantified Score

Hi-index	0.00

Visualization

Abstract

Bag-of-Words lies at a heart of modern object category recognition systems. After descriptors are extracted from images, they are expressed as vectors representing visual word content, referred to as mid-level features. In this paper, we review a number of techniques for generating mid-level features, including two variants of Soft Assignment, Locality-constrained Linear Coding, and Sparse Coding. We also isolate the underlying properties that affect their performance. Moreover, we investigate various pooling methods that aggregate mid-level features into vectors representing images. Average pooling, Max-pooling, and a family of likelihood inspired pooling strategies are scrutinised. We demonstrate how both coding schemes and pooling methods interact with each other. We generalise the investigated pooling methods to account for the descriptor interdependence and introduce an intuitive concept of improved pooling. We also propose a coding-related improvement to increase its speed. Lastly, state-of-the-art performance in classification is demonstrated on Caltech101, Flower17, and ImageCLEF11 datasets.