Searching visual instances with topology checking and context modeling

  • Authors:
  • Wei Zhang;Chong-Wah Ngo

  • Affiliations:
  • City University of Hong Kong, Hong Kong, Hong Kong;City University of Hong Kong, Hong Kong, Hong Kong

  • Venue:
  • Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Instance Search (INS) is a realistic problem initiated by TRECVID, which is to retrieve all occurrences of the querying object, location, or person from a large video collection. It is a fundamental problem with many applications, and also a challenging problem different from the traditional concept or near-duplicate (ND) search, since the relevancy is defined at instance level. True responses could exhibit various visual variations, such as being small on the image with different background, or showing a non-homography spatial configuration. Based on the Bag-of-Words model, we propose two techniques tailored for Instance Search. Specifically, we explore the use of (1) an elastic spatial topology checking technique based on Delaunay Triangulation (DT), and (2) a practical background context modeling method by simulating the "stare" behavior of human eyes. With DT, we improve the quality of visual matching by accumulating evidence from local topology-preserving patches, significantly boosting the ranks of topology consistent results. On the other hand, we increase the information quantity for visual matching with the "stare" model, such that instances appearing in both similar and different background can be highly ranked as results. The proposed techniques are evaluated on the INS datasets of TRECVID, achieving large performance gain with small computation overhead, compared with several existing methods.