ObjectPatchNet: Towards scalable and semantic image annotation and retrieval

Authors:
Shiliang Zhang;Qi Tian;Gang Hua;Qingming Huang;Wen Gao
Affiliations:
Key Lab of Intell. Info. Process. Inst. of Comput. Tech. CAS, Beijing 100190, China;Dept. of Computer Science, University of Texas at San Antonio, Texas, TX 78249, USA;Dept. of Computer Science, Stevens Institute of Technology, Hoboken, NJ 07030, USA;Graduate University of Chinese Academy of Sciences, Beijing 100049, China;School of EE & CS, Peking University, Beijing 100871, China
Venue:
Computer Vision and Image Understanding
Year:
2014

Citing 23
Cited 0

The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Mean Shift: A Robust Approach Toward Feature Space Analysis

IEEE Transactions on Pattern Analysis and Machine Intelligence
Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach

IEEE Transactions on Pattern Analysis and Machine Intelligence
Efficient Graph-Based Image Segmentation

International Journal of Computer Vision
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Scalable Recognition with a Vocabulary Tree

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Face Description with Local Binary Patterns: Application to Face Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Correlative multi-label video annotation

Proceedings of the 15th international conference on Multimedia
80 Million Tiny Images: A Large Data Set for Nonparametric Object and Scene Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Flickr distance

MM '08 Proceedings of the 16th ACM international conference on Multimedia
Multi-cue fusion for semantic video indexing

MM '08 Proceedings of the 16th ACM international conference on Multimedia
FaceTracer: A Search Engine for Large Collections of Images with Faces

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part IV
Label to region by bi-layer sparsity priors

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Inferring semantic concepts from community-contributed images and noisy tags

MM '09 Proceedings of the 17th ACM international conference on Multimedia
NUS-WIDE: a real-world web image database from National University of Singapore

Proceedings of the ACM International Conference on Image and Video Retrieval
Learning to rank tags

Proceedings of the ACM International Conference on Image and Video Retrieval
Unified tag analysis with multi-edge graph

Proceedings of the international conference on Multimedia
Image retagging

Proceedings of the international conference on Multimedia
Multiple Bernoulli relevance models for image and video annotation

CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
Image annotation by kNN-sparse graph-based label propagation over noisily tagged web images

ACM Transactions on Intelligent Systems and Technology (TIST)
Interactively building a discriminative vocabulary of nameable attributes

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Image ranking and retrieval based on multi-attribute queries

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Combining attributes and Fisher vectors for efficient image retrieval

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

The ever increasing Internet image collection densely samples the real world objects, scenes, etc. and is commonly accompanied with multiple metadata such as textual descriptions and user comments. Such image data has potential to serve as a knowledge source for large-scale image applications. Facilitated by such publically available and ever-increasing loosely annotated image data on the Internet, we propose a scalable data-driven solution for annotating and retrieving Web-scale image data. We extrapolate from large-scale loosely annotated images a compact and informative representation, namely ObjectPatchNet. Each vertex in ObjectPatchNet, which is called as an ObjectPatchNode, is defined as a collection of discriminative image patches annotated with object category labels. The edge linking two ObjectPatchNodes models the co-occurrence relationship among different objects in the same image. Therefore, ObjectPatchNet models not only probabilistically labeled image patches, but also the contextual relationship between objects. It is well suited to scalable image annotation task. Besides, we further take ObjectPatchNet as a visual vocabulary with semantic labels, and hence are able to easily develop inverted file indexing for efficient semantic image retrieval. ObjectPatchNet is tested on both large-scale image annotation and large-scale image retrieval applications. Experimental results manifest that ObjectPatchNet is both discriminative and efficient in these applications.