Similarity Search in High Dimensions via Hashing
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Video Google: A Text Retrieval Approach to Object Matching in Videos
ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Distinctive Image Features from Scale-Invariant Keypoints
International Journal of Computer Vision
Creating Efficient Codebooks for Visual Recognition
ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features
ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Object Categorization by Learned Universal Visual Dictionary
ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Scalable Recognition with a Vocabulary Tree
CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories
CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Spatial Weighting for Bag-of-Features
CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Discriminative Object Class Models of Appearance and Shape by Correlatons
CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Universal and Adapted Vocabularies for Generic Visual Categorization
IEEE Transactions on Pattern Analysis and Machine Intelligence
Randomized Clustering Forests for Image Classification
IEEE Transactions on Pattern Analysis and Machine Intelligence
VisualRank: Applying PageRank to Large-Scale Image Search
IEEE Transactions on Pattern Analysis and Machine Intelligence
Video Event Recognition Using Kernel Methods with Multilevel Temporal Alignment
IEEE Transactions on Pattern Analysis and Machine Intelligence
Bayesian video search reranking
MM '08 Proceedings of the 16th ACM international conference on Multimedia
SIFT-Bag kernel for video event analysis
MM '08 Proceedings of the 16th ACM international conference on Multimedia
Video event detection using motion relativity and visual relatedness
MM '08 Proceedings of the 16th ACM international conference on Multimedia
Proceedings of the 18th international conference on World wide web
Supervised Learning of Quantizer Codebooks by Information Loss Minimization
IEEE Transactions on Pattern Analysis and Machine Intelligence
Semantics-preserving bag-of-words models for efficient image annotation
LS-MMRM '09 Proceedings of the First ACM workshop on Large-scale multimedia retrieval and mining
Descriptive visual words and visual phrases for image applications
MM '09 Proceedings of the 17th ACM international conference on Multimedia
Semantic context transfer across heterogeneous sources for domain adaptive video search
MM '09 Proceedings of the 17th ACM international conference on Multimedia
Building descriptive and discriminative visual codebook for large-scale image applications
Multimedia Tools and Applications
MM '11 Proceedings of the 19th ACM international conference on Multimedia
Large scale image search with geometric coding
MM '11 Proceedings of the 19th ACM international conference on Multimedia
Contextual synonym dictionary for visual object retrieval
MM '11 Proceedings of the 19th ACM international conference on Multimedia
From local features to local regions
MM '11 Proceedings of the 19th ACM international conference on Multimedia
Point-context descriptor based region search for logo recognition
Proceedings of the 4th International Conference on Internet Multimedia Computing and Service
Exploiting visual word co-occurrence for image retrieval
Proceedings of the 20th ACM international conference on Multimedia
Scalar quantization for large scale image search
Proceedings of the 20th ACM international conference on Multimedia
Embedding spatial context information into inverted filefor large-scale image retrieval
Proceedings of the 20th ACM international conference on Multimedia
Spatial pooling of heterogeneous features for image applications
Proceedings of the 20th ACM international conference on Multimedia
Towards measuring the visualness of a concept
Proceedings of the 21st ACM international conference on Information and knowledge management
Randomized spatial partition for scene recognition
ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part II
SIFT match verification by geometric coding for large-scale partial-duplicate web image search
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Contextual pooling in image classification
ACCV'12 Proceedings of the 11th Asian conference on Computer Vision - Volume Part I
Image search—from thousands to billions in 20 years
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP) - Special Sections on the 20th Anniversary of ACM International Conference on Multimedia, Best Papers of ACM Multimedia 2012
Multi-order visual phrase for scalable image search
Proceedings of the Fifth International Conference on Internet Multimedia Computing and Service
Visual object analysis using regions and interest points
Proceedings of the 21st ACM international conference on Multimedia
Discriminative Hough context model for object detection
The Visual Computer: International Journal of Computer Graphics
Hi-index | 0.00 |
Not withstanding its great success and wide adoption in Bag-of-visual Words representation, visual vocabulary created from single image local features is often shown to be ineffective largely due to three reasons. First, many detected local features are not stable enough, resulting in many noisy and non-descriptive visual words in images. Second, single visual word discards the rich spatial contextual information among the local features, which has been proven to be valuable for visual matching. Third, the distance metric commonly used for generating visual vocabulary does not take the semantic context into consideration, which renders them to be prone to noise. To address these three confrontations, we propose an effective visual vocabulary generation framework containing three novel contributions: 1) we propose an effective unsupervised local feature refinement strategy; 2) we consider local features in groups to model their spatial contexts; 3) we further learn a discriminant distance metric between local feature groups, which we call discriminant group distance. This group distance is further leveraged to induce visual vocabulary from groups of local features. We name it contextual visual vocabulary, which captures both the spatial and semantic contexts. We evaluate the proposed local feature refinement strategy and the contextual visual vocabulary in two large-scale image applications: large-scale near-duplicate image retrieval on a dataset containing 1.5 million images and image search re-ranking tasks. Our experimental results show that the contextual visual vocabulary shows significant improvement over the classic visual vocabulary. Moreover, it outperforms the state-of-the-art Bundled Feature in the terms of retrieval precision, memory consumption and efficiency.