Spatial pooling for transformation invariant image representation

Authors:
Xia Li;Yan Song;Yijuan Lu;Qi Tian
Affiliations:
University of Texas at San Antonio, San Antonio, TX, USA;University of Science and Technology of China, Hefei, China;Texas State University, San Marcos, TX, USA;University of Texas at San Antonio, San Antonio, TX, USA
Venue:
MM '11 Proceedings of the 19th ACM international conference on Multimedia
Year:
2011

Citing 12
Cited 0

Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope

International Journal of Computer Vision
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories

CVPRW '04 Proceedings of the 2004 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'04) Volume 12 - Volume 12
A Bayesian Hierarchical Model for Learning Natural Scene Categories

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
Scalable Recognition with a Vocabulary Tree

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Semi-supervised kernel density estimation for video annotation

Computer Vision and Image Understanding
Descriptive visual words and visual phrases for image applications

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Unified video annotation via multigraph learning

IEEE Transactions on Circuits and Systems for Video Technology
The Pascal Visual Object Classes (VOC) Challenge

International Journal of Computer Vision
Spatial coding for large scale partial-duplicate web image search

Proceedings of the international conference on Multimedia
Vlfeat: an open and portable library of computer vision algorithms

Proceedings of the international conference on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

Spatial Pyramid Matching (SPM) [2] has been proposed to extend the Bag-of-Word (BoW) model for object classification. By re-serving the finer level information, it makes image matching more accurate. However, for not well-aligned images, where the object is rotated, flipped or translated, SPM may lose its discrimination power. To tackle this problem, we propose novel spatial pooling layouts to address various transformations, and generate a more general image representation. To evaluate the effectiveness of the proposed approach, we conduct extensive experiments on three transformation emphasized datasets for object classification task. Experimental results demonstrate its superiority over the state-of-the-arts. Besides, the proposed image representation is compact and consistent with the BoW model, which makes it applicable to image retrieval task as well.