Building contextual visual vocabulary for large-scale image applications

Authors:
Shiliang Zhang;Qingming Huang;Gang Hua;Shuqiang Jiang;Wen Gao;Qi Tian
Affiliations:
Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;Graduate University of Chinese Academy of Sciences, Beijing, China;IBM Research T. J. Watson Center, New York, USA;Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;Institute of Digital Multimedia, Peking University, Beijing, China;Computer Science Depart., University of Texas at San Antonio, San Antonio, USA
Venue:
Proceedings of the international conference on Multimedia
Year:
2010

Citing 22
Cited 19

Similarity Search in High Dimensions via Hashing

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Video Google: A Text Retrieval Approach to Object Matching in Videos

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Creating Efficient Codebooks for Visual Recognition

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Object Categorization by Learned Universal Visual Dictionary

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Scalable Recognition with a Vocabulary Tree

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Spatial Weighting for Bag-of-Features

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Discriminative Object Class Models of Appearance and Shape by Correlatons

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Universal and Adapted Vocabularies for Generic Visual Categorization

IEEE Transactions on Pattern Analysis and Machine Intelligence
Randomized Clustering Forests for Image Classification

IEEE Transactions on Pattern Analysis and Machine Intelligence
VisualRank: Applying PageRank to Large-Scale Image Search

IEEE Transactions on Pattern Analysis and Machine Intelligence
Video Event Recognition Using Kernel Methods with Multilevel Temporal Alignment

IEEE Transactions on Pattern Analysis and Machine Intelligence
Bayesian video search reranking

MM '08 Proceedings of the 16th ACM international conference on Multimedia
SIFT-Bag kernel for video event analysis

MM '08 Proceedings of the 16th ACM international conference on Multimedia
Video event detection using motion relativity and visual relatedness

MM '08 Proceedings of the 16th ACM international conference on Multimedia
Tag ranking

Proceedings of the 18th international conference on World wide web
Supervised Learning of Quantizer Codebooks by Information Loss Minimization

IEEE Transactions on Pattern Analysis and Machine Intelligence
Semantics-preserving bag-of-words models for efficient image annotation

LS-MMRM '09 Proceedings of the First ACM workshop on Large-scale multimedia retrieval and mining
Descriptive visual words and visual phrases for image applications

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Semantic context transfer across heterogeneous sources for domain adaptive video search

MM '09 Proceedings of the 17th ACM international conference on Multimedia

Building descriptive and discriminative visual codebook for large-scale image applications

Multimedia Tools and Applications
Semantic point detector

MM '11 Proceedings of the 19th ACM international conference on Multimedia
Large scale image search with geometric coding

MM '11 Proceedings of the 19th ACM international conference on Multimedia
Contextual synonym dictionary for visual object retrieval

MM '11 Proceedings of the 19th ACM international conference on Multimedia
From local features to local regions

MM '11 Proceedings of the 19th ACM international conference on Multimedia
Point-context descriptor based region search for logo recognition

Proceedings of the 4th International Conference on Internet Multimedia Computing and Service
Exploiting visual word co-occurrence for image retrieval

Proceedings of the 20th ACM international conference on Multimedia
Scalar quantization for large scale image search

Proceedings of the 20th ACM international conference on Multimedia
Embedding spatial context information into inverted filefor large-scale image retrieval

Proceedings of the 20th ACM international conference on Multimedia
Spatial pooling of heterogeneous features for image applications

Proceedings of the 20th ACM international conference on Multimedia
Towards measuring the visualness of a concept

Proceedings of the 21st ACM international conference on Information and knowledge management
Randomized spatial partition for scene recognition

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part II
SIFT match verification by geometric coding for large-scale partial-duplicate web image search

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Contextual pooling in image classification

ACCV'12 Proceedings of the 11th Asian conference on Computer Vision - Volume Part I
Image search—from thousands to billions in 20 years

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP) - Special Sections on the 20th Anniversary of ACM International Conference on Multimedia, Best Papers of ACM Multimedia 2012
Multi-order visual phrase for scalable image search

Proceedings of the Fifth International Conference on Internet Multimedia Computing and Service
Visual object analysis using regions and interest points

Proceedings of the 21st ACM international conference on Multimedia
Weighted visual vocabulary to balance the descriptive ability on general dataset

Neurocomputing
Discriminative Hough context model for object detection

The Visual Computer: International Journal of Computer Graphics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Not withstanding its great success and wide adoption in Bag-of-visual Words representation, visual vocabulary created from single image local features is often shown to be ineffective largely due to three reasons. First, many detected local features are not stable enough, resulting in many noisy and non-descriptive visual words in images. Second, single visual word discards the rich spatial contextual information among the local features, which has been proven to be valuable for visual matching. Third, the distance metric commonly used for generating visual vocabulary does not take the semantic context into consideration, which renders them to be prone to noise. To address these three confrontations, we propose an effective visual vocabulary generation framework containing three novel contributions: 1) we propose an effective unsupervised local feature refinement strategy; 2) we consider local features in groups to model their spatial contexts; 3) we further learn a discriminant distance metric between local feature groups, which we call discriminant group distance. This group distance is further leveraged to induce visual vocabulary from groups of local features. We name it contextual visual vocabulary, which captures both the spatial and semantic contexts. We evaluate the proposed local feature refinement strategy and the contextual visual vocabulary in two large-scale image applications: large-scale near-duplicate image retrieval on a dataset containing 1.5 million images and image search re-ranking tasks. Our experimental results show that the contextual visual vocabulary shows significant improvement over the classic visual vocabulary. Moreover, it outperforms the state-of-the-art Bundled Feature in the terms of retrieval precision, memory consumption and efficiency.