Beyond spatial pyramids: a new feature extraction framework with dense spatial sampling for image classification

Authors:
Shengye Yan;Xinxing Xu;Dong Xu;Stephen Lin;Xuelong Li
Affiliations:
School of Computer Engineering, Nanyang Technological University, Singapore;School of Computer Engineering, Nanyang Technological University, Singapore;School of Computer Engineering, Nanyang Technological University, Singapore;Microsoft Research Asia, China;OPTIMAL, State Key Laboratory of Transient Optics and Photonics, Xi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, China
Venue:
ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part IV
Year:
2012

Citing 20
Cited 0

Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope

International Journal of Computer Vision
Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns

IEEE Transactions on Pattern Analysis and Machine Intelligence
Video Google: A Text Retrieval Approach to Object Matching in Videos

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Histograms of Oriented Gradients for Human Detection

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Multilevel Image Coding with Hyperfeatures

International Journal of Computer Vision
Efficient object category recognition using classemes

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part I
Improving the fisher kernel for large-scale image classification

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part IV
Efficient highly over-complete sparse coding using a mixture model

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part V
Image classification using super-vector coding of local image descriptors

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part V
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
lp-Norm Multiple Kernel Learning

The Journal of Machine Learning Research
Discriminative affine sparse codes for image classification

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Discriminative spatial pyramid

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Beyond spatial pyramids: Receptive field learning for pooled image features

CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Visual word disambiguation by semantic contexts

ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
Ask the locals: Multi-way local pooling for image recognition

ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
A graph-matching kernel for object categorization

ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
Visual Event Recognition in Videos by Learning from Web Data

IEEE Transactions on Pattern Analysis and Machine Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

We introduce a new framework for image classification that extends beyond the window sampling of fixed spatial pyramids to include a comprehensive set of windows densely sampled over location, size and aspect ratio. To effectively deal with this large set of windows, we derive a concise high-level image feature using a two-level extraction method. At the first level, window-based features are computed from local descriptors (e.g., SIFT, spatial HOG, LBP) in a process similar to standard feature extractors. Then at the second level, the new image feature is determined from the window-based features in a manner analogous to the first level. This higher level of abstraction offers both efficient handling of dense samples and reduced sensitivity to misalignment. More importantly, our simple yet effective framework can readily accommodate a large number of existing pooling/coding methods, allowing them to extract features beyond the spatial pyramid representation. To effectively fuse the second level feature with a standard first level image feature for classification, we additionally propose a new learning algorithm, called Generalized Adaptive ℓp-norm Multiple Kernel Learning (GA-MKL), to learn an adapted robust classifier based on multiple base kernels constructed from image features and multiple sets of pre-learned classifiers of all the classes. Extensive evaluation on the object recognition (Caltech256) and scene recognition (15Scenes) benchmark datasets demonstrates that the proposed method outperforms state-of-the-art image classification algorithms under a broad range of settings.