Spatial pooling of heterogeneous features for image applications

Authors:
Lingxi Xie;Qi Tian;Bo Zhang
Affiliations:
Tsinghua University, Beijing, China;University of Texas at San Antonio, San Antonio, TX, USA;Tsinghua University, Beijing, China
Venue:
Proceedings of the 20th ACM international conference on Multimedia
Year:
2012

Citing 18
Cited 0

A Computational Approach to Edge Detection

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Model of Saliency-Based Visual Attention for Rapid Scene Analysis

IEEE Transactions on Pattern Analysis and Machine Intelligence
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Histograms of Oriented Gradients for Human Detection

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories

Computer Vision and Image Understanding
Descriptive visual words and visual phrases for image applications

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Building contextual visual vocabulary for large-scale image applications

Proceedings of the international conference on Multimedia
Spatial coding for large scale partial-duplicate web image search

Proceedings of the international conference on Multimedia
Large-scale robust visual codebook construction

Proceedings of the international conference on Multimedia
Vlfeat: an open and portable library of computer vision algorithms

Proceedings of the international conference on Multimedia
Building descriptive and discriminative visual codebook for large-scale image applications

Multimedia Tools and Applications
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
Scene classification via pLSA

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part IV
Image retrieval with geometry-preserving visual phrases

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Geometric $/ell$_p-norm feature pooling for image classification

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Mining discriminative co-occurrence patterns for visual recognition

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Constructing Concept Lexica With Small Semantic Gaps

IEEE Transactions on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Bag-of-Features (BoF) model has played an important role for image representation in many multimedia applications. It has been extensively applied to many tasks including image classification, image retrieval, scene understanding, and so on. Despite the advantages of this model such as simplicity, efficiency and generality, there are also notable drawbacks for this model, including poor power of semantic expression of local descriptors, and lack of robust structures upon single visual words. To overcome these problems, various techniques have been proposed, such as multiple descriptors, spatial context modeling and interest region detection. Though they have been proven to improve the BoF model to some extent, there still lacks a coherent scheme to integrate each individual module. To address the problems above, we propose a novel framework with spatial pooling of heterogeneous features. Our framework differs from the traditional Bag-of-Features model on three aspects. First, we propose a new scheme for combining texture and edge based local features together at the descriptor extraction level. Next, we build geometric visual phrases to model spatial context upon heterogeneous features for mid-level representation of images. Finally, based on a smoothed edgemap, a simple and effective spatial weighting scheme is performed on our mid-level image representation. We test our integrated framework on several benchmark datasets for image classification and retrieval applications. The extensive results show the superior performance of our algorithm over state-of-the-art methods.