Searching visual instances with topology checking and context modeling

Authors:
Wei Zhang;Chong-Wah Ngo
Affiliations:
City University of Hong Kong, Hong Kong, Hong Kong;City University of Hong Kong, Hong Kong, Hong Kong
Venue:
Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
Year:
2013

Citing 13
Cited 0

Multiple View Geometry in Computer Vision

Multiple View Geometry in Computer Vision
Video Google: A Text Retrieval Approach to Object Matching in Videos

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Scale & Affine Invariant Interest Point Detectors

International Journal of Computer Vision
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Scalable Recognition with a Vocabulary Tree

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Evaluation campaigns and TRECVid

MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
Scalable near identical image and shot detection

Proceedings of the 6th ACM international conference on Image and video retrieval
Practical elimination of near-duplicates from web video search

Proceedings of the 15th international conference on Multimedia
Large-Scale Discovery of Spatially Related Images

IEEE Transactions on Pattern Analysis and Machine Intelligence
Product Quantization for Nearest Neighbor Search

IEEE Transactions on Pattern Analysis and Machine Intelligence
On the Annotation of Web Videos by Efficient Near-Duplicate Search

IEEE Transactions on Multimedia
Large vocabulary quantization for searching instances from videos

Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
Snap-and-ask: answering multimodal question by naming visual instance

Proceedings of the 20th ACM international conference on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

Instance Search (INS) is a realistic problem initiated by TRECVID, which is to retrieve all occurrences of the querying object, location, or person from a large video collection. It is a fundamental problem with many applications, and also a challenging problem different from the traditional concept or near-duplicate (ND) search, since the relevancy is defined at instance level. True responses could exhibit various visual variations, such as being small on the image with different background, or showing a non-homography spatial configuration. Based on the Bag-of-Words model, we propose two techniques tailored for Instance Search. Specifically, we explore the use of (1) an elastic spatial topology checking technique based on Delaunay Triangulation (DT), and (2) a practical background context modeling method by simulating the "stare" behavior of human eyes. With DT, we improve the quality of visual matching by accumulating evidence from local topology-preserving patches, significantly boosting the ranks of topology consistent results. On the other hand, we increase the information quantity for visual matching with the "stare" model, such that instances appearing in both similar and different background can be highly ranked as results. The proposed techniques are evaluated on the INS datasets of TRECVID, achieving large performance gain with small computation overhead, compared with several existing methods.