Video Google: A Text Retrieval Approach to Object Matching in Videos
ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Distinctive Image Features from Scale-Invariant Keypoints
International Journal of Computer Vision
The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features
ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Scalable Recognition with a Vocabulary Tree
CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories
CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Discriminative Object Class Models of Appearance and Shape by Correlatons
CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Video Event Recognition Using Kernel Methods with Multilevel Temporal Alignment
IEEE Transactions on Pattern Analysis and Machine Intelligence
Video event detection using motion relativity and visual relatedness
MM '08 Proceedings of the 16th ACM international conference on Multimedia
Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search
ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part I
Semantics-preserving bag-of-words models for efficient image annotation
LS-MMRM '09 Proceedings of the First ACM workshop on Large-scale multimedia retrieval and mining
Descriptive visual words and visual phrases for image applications
MM '09 Proceedings of the 17th ACM international conference on Multimedia
Improving Bag-of-Features for Large Scale Image Search
International Journal of Computer Vision
Matching pursuits with time-frequency dictionaries
IEEE Transactions on Signal Processing
Contextual Bag-of-Words for Visual Categorization
IEEE Transactions on Circuits and Systems for Video Technology
Hi-index | 0.00 |
Bag-of-visual Words (BoW) image representation has been illustrated as one of the most promising solutions for large-scale near-duplicated image retrieval. However, the traditional visual vocabulary is created in an unsupervised way by clustering a large number of image local features. This is not ideal because it largely ignores the semantic and spatial contexts between local features. In this paper, we propose the geometric visual vocabulary which captures the spatial contexts by quantizing local features in bi-space, i.e., in descriptor space and orientation space. Then, we propose to capture the semantic context by learning a semantic-aware distance metric between local features, which could reasonably measure the semantic similarities between image patches, from which the local features are extracted. The learned distance is hence utilized to cluster the local features for semantic visual vocabulary generation. Finally, we combine the spatial and semantic contexts in a unified framework by extracting local feature groups, computing the spatial configurations between the local features inside the group, and learning a semantic-aware distance between groups. The learned group distance is then utilized to cluster the extracted local feature groups to generate a novel visual vocabulary, i.e., the contextual visual vocabulary. The proposed visual vocabularies, i.e., geometric visual vocabulary, semantic visual vocabulary and contextual visual vocabulary are tested in large-scale near-duplicated image retrieval applications. The geometric visual vocabulary and semantic visual vocabulary achieve better performance than the traditional visual vocabulary. Moreover, the contextual visual vocabulary, which combines both spatial and semantic clues outperforms the state-of-the-art bundled feature in both retrieval precision and efficiency.