Representing and recognizing objects with massive local image patches

Authors:
Liang Lin;Ping Luo;Xiaowu Chen;Kun Zeng
Affiliations:
School of Software, Sun Yat-Sen University, Guangzhou 510006, China;School of Software, Sun Yat-Sen University, Guangzhou 510006, China;School of Computer Science and Engineering, Beihang University, Beijing 100191, China;School of Software, Sun Yat-Sen University, Guangzhou 510006, China
Venue:
Pattern Recognition
Year:
2012

Citing 12
Cited 4

Image Representation Using 2D Gabor Wavelets

IEEE Transactions on Pattern Analysis and Machine Intelligence
Learning a Sparse Representation for Object Detection

ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part IV
Robust Real-Time Face Detection

International Journal of Computer Vision
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Histograms of Oriented Gradients for Human Detection

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
A Bayesian Hierarchical Model for Learning Natural Scene Categories

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
Probabilistic Boosting-Tree: Learning Discriminative Models for Classification, Recognition, and Clustering

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study

International Journal of Computer Vision
Minimax Entropy Principle and Its Application to Texture Modeling

Neural Computation
Unsupervised Category Modeling, Recognition, and Segmentation in Images

IEEE Transactions on Pattern Analysis and Machine Intelligence
A stochastic graph grammar for compositional object representation and recognition

Pattern Recognition
Layered Graph Matching with Composite Cluster Sampling

IEEE Transactions on Pattern Analysis and Machine Intelligence

Object categorization with sketch representation and generalized samples

Pattern Recognition
Robust stroke-based video animation via layered motion and correspondence

Proceedings of the 20th ACM international conference on Multimedia
Fuzzy logic-based pre-classifier for tropical wood species recognition system

Machine Vision and Applications
Salient object detection based on regions

Multimedia Tools and Applications

Quantified Score

Hi-index	0.01

Visualization

Abstract

Natural image patches are fundamental elements for visual pattern modeling and recognition. By studying the intrinsic manifold structures in the space of image patches, this paper proposes an approach for representing and recognizing objects with a massive number of local image patches (e.g. 17x17 pixels). Given a large collection (10^4) of proto image patches extracted from objects, we map them into two types of manifolds with different metrics: explicit manifolds of low dimensions for structural primitives, and implicit manifolds of high dimensions for stochastic textures. We define these manifolds grown from patches as the ''@e-balls'', where @e corresponds to the perception residual or fluctuation. Using these @e-balls as features, we present a novel generative learning algorithm by the information projection principle. This algorithm greedily stepwise pursues the object models by selecting sparse and independent @e-balls (say 10^3 for each category). During the detection and classification phase, only a small number (say 20) of features are activated by a fast KD-tree indexing technique. The proposed method owns two characters. (1) Automatically generating features (@e-balls) from local image patches rather than designing marginal feature carefully and category-specifically. (2) Unlike the weak classifiers in the boosting models, these selected @e-ball features are used to explain object in a generative way and are mutually independent. The advantage and performance of our approach is evaluated on several challenging datasets with the task of localizing objects against appearance variance, occlusion and background clutter.