Multiple View Geometry in Computer Vision
Multiple View Geometry in Computer Vision
Video Google: A Text Retrieval Approach to Object Matching in Videos
ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Scale & Affine Invariant Interest Point Detectors
International Journal of Computer Vision
Distinctive Image Features from Scale-Invariant Keypoints
International Journal of Computer Vision
Scalable Recognition with a Vocabulary Tree
CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Evaluation campaigns and TRECVid
MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
Scalable near identical image and shot detection
Proceedings of the 6th ACM international conference on Image and video retrieval
Practical elimination of near-duplicates from web video search
Proceedings of the 15th international conference on Multimedia
Large-Scale Discovery of Spatially Related Images
IEEE Transactions on Pattern Analysis and Machine Intelligence
Product Quantization for Nearest Neighbor Search
IEEE Transactions on Pattern Analysis and Machine Intelligence
On the Annotation of Web Videos by Efficient Near-Duplicate Search
IEEE Transactions on Multimedia
Large vocabulary quantization for searching instances from videos
Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
Snap-and-ask: answering multimodal question by naming visual instance
Proceedings of the 20th ACM international conference on Multimedia
Hi-index | 0.00 |
Instance Search (INS) is a realistic problem initiated by TRECVID, which is to retrieve all occurrences of the querying object, location, or person from a large video collection. It is a fundamental problem with many applications, and also a challenging problem different from the traditional concept or near-duplicate (ND) search, since the relevancy is defined at instance level. True responses could exhibit various visual variations, such as being small on the image with different background, or showing a non-homography spatial configuration. Based on the Bag-of-Words model, we propose two techniques tailored for Instance Search. Specifically, we explore the use of (1) an elastic spatial topology checking technique based on Delaunay Triangulation (DT), and (2) a practical background context modeling method by simulating the "stare" behavior of human eyes. With DT, we improve the quality of visual matching by accumulating evidence from local topology-preserving patches, significantly boosting the ranks of topology consistent results. On the other hand, we increase the information quantity for visual matching with the "stare" model, such that instances appearing in both similar and different background can be highly ranked as results. The proposed techniques are evaluated on the INS datasets of TRECVID, achieving large performance gain with small computation overhead, compared with several existing methods.