Large vocabulary quantization for searching instances from videos

Authors:
Cai-Zhi Zhu;Shin'ichi Satoh
Affiliations:
National Institute of Informatics, Tokyo, Japan;National Institute of Informatics, Tokyo, Japan
Venue:
Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
Year:
2012

Citing 9
Cited 3

Video Google: A Text Retrieval Approach to Object Matching in Videos

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Scale & Affine Invariant Interest Point Detectors

International Journal of Computer Vision
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Scalable Recognition with a Vocabulary Tree

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Estimating average precision with incomplete and imperfect judgments

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Spatial coding for large scale partial-duplicate web image search

Proceedings of the international conference on Multimedia
Consistent visual words mining with adaptive sampling

Proceedings of the 1st ACM International Conference on Multimedia Retrieval
Large scale image search with geometric coding

MM '11 Proceedings of the 19th ACM international conference on Multimedia

Searching visual instances with topology checking and context modeling

Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
Connect commercial films with realities

Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
Bag of visual words model for videos segmentation into scenes

Proceedings of the Fifth International Conference on Internet Multimedia Computing and Service

Quantified Score

Hi-index	0.00

Visualization

Abstract

A very promising application involving video collections is to search for relevant video segments from a video database when given few visual examples of the specific instance, e.g. a person, object, or place. However, this problem is difficult due to the lighting variations, different viewpoints, partial occlusion, and large changes in appearance. In this paper, we focus on a kind of restricted instance searching task, where the region of a specific instance to be searched for is manually labeled on each query image. We formulate this problem in a large vocabulary quantization based Bag-of-Words framework, while putting more research emphasis on investigating to what extent we can benefit from these labeled instance regions. The contribution of this paper mainly lies in two aspects: first, we proposed an algorithm for instance search that outperformed all submissions on the instance search dataset TRECVID 2011. Secondly, after thoroughly analyzing the experiment results, we show that our top performance is mainly due to similar scene retrieval, instead of the same instance search. This observation reveals that in the current dataset background is more dominated than instance, and it also suggests that a promising direction in which to further improve the current algorithm, which may also be the breakthrough for achieving this challenge, is to investigate more about how to truly take advantage of additional labeled instance regions. We believe our research opens a window for future new methods for searching instance.