Video Google: A Text Retrieval Approach to Object Matching in Videos

  • Authors:
  • Josef Sivic;Andrew Zisserman

  • Affiliations:
  • -;-

  • Venue:
  • ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
  • Year:
  • 2003

Quantified Score

Hi-index 0.02

Visualization

Abstract

We describe an approach to object and scene retrievalwhich searches for and localizes all the occurrences of auser outlined object in a video. The object is represented bya set of viewpoint invariant region descriptors so that recognitioncan proceed successfully despite changes in viewpoint,illumination and partial occlusion. The temporalcontinuity of the video within a shot is used to track theregions in order to reject unstable regions and reduce theeffects of noise in the descriptors.The analogy with text retrieval is in the implementationwhere matches on descriptors are pre-computed (using vectorquantization), and inverted file systems and documentrankings are used. The result is that retrieval is immediate,returning a ranked list of key frames/shots in the manner ofGoogle.The method is illustrated for matching on two full lengthfeature films.