Efficient Visual Search of Videos Cast as Text Retrieval

Authors:
Josef Sivic;Andrew Zisserman
Affiliations:
INRIA, WILLOW Project-Team, CNRS/ENS/INRIA UMR, France;University of Oxford, Oxford
Venue:
IEEE Transactions on Pattern Analysis and Machine Intelligence
Year:
2009

Citing 0
Cited 19

Unsupervised writer adaptation of whole-word HMMs with application to word-spotting

Pattern Recognition Letters
Travelmedia: An intelligent management system for media captured in travel

Journal of Visual Communication and Image Representation
Pairwise weak geometric consistency for large scale image search

Proceedings of the 1st ACM International Conference on Multimedia Retrieval
Efficient and robust alignment of unsynchronized video sequences

DAGM'11 Proceedings of the 33rd international conference on Pattern recognition
Immediate structured visual search for medical images

MICCAI'11 Proceedings of the 14th international conference on Medical image computing and computer-assisted intervention - Volume Part III
Common visual pattern discovery via graph matching

MM '11 Proceedings of the 19th ACM international conference on Multimedia
Contextual image search

MM '11 Proceedings of the 19th ACM international conference on Multimedia
Drosophila Gene Expression Pattern Annotation through Multi-Instance Multi-Label Learning

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Spatially-aware indexing for image object retrieval

Proceedings of the fifth ACM international conference on Web search and data mining
Content-Based retrieval in endomicroscopy: toward an efficient smart atlas for clinical diagnosis

MCBR-CDS'11 Proceedings of the Second MICCAI international conference on Medical Content-Based Retrieval for Clinical Decision Support
An efficient approach to content-based object retrieval in videos

Neurocomputing
Recommending Flickr groups with social topic model

Information Retrieval
Query-driven iterated neighborhood graph search for large scale indexing

Proceedings of the 20th ACM international conference on Multimedia
Exploiting eye-hand coordination to detect grasping movements

Image and Vision Computing
Detection of near-duplicate patches in random images using keypoint-based features

ACIVS'12 Proceedings of the 14th international conference on Advanced Concepts for Intelligent Vision Systems
Randomized spatial partition for scene recognition

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part II
SmartVisionApp: A framework for computer vision applications on mobile devices

Expert Systems with Applications: An International Journal
Knowledge-based extraction of intellectual capital-related information from unstructured data

Expert Systems with Applications: An International Journal
3D object retrieval via range image queries in a bag-of-visual-words context

The Visual Computer: International Journal of Computer Graphics

Quantified Score

Hi-index	0.15

Visualization

Abstract

We describe an approach to object retrieval which searches for and localizes all the occurrences of an object in a video, given a query image of the object. The object is represented by a set of viewpoint invariant region descriptors so that recognition can proceed successfully despite changes in viewpoint, illumination and partial occlusion. The temporal continuity of the video within a shot is used to track the regions in order to reject those that are unstable. Efficient retrieval is achieved by employing methods from statistical text retrieval, including inverted file systems, and text and document frequency weightings. This requires a visual analogy of a word which is provided here by vector quantizing the region descriptors. The final ranking also depends on the spatial layout of the regions. The result is that retrieval is immediate, returning a ranked list of shots in the manner of Google. We report results for object retrieval on the full length feature films 'Groundhog Day', 'Casablanca' and 'Run Lola Run', including searches from within the movie and specified by external images downloaded from the Internet. We investigate retrieval performance with respect to different quantizations of region descriptors and compare the performance of several ranking measures.