Detection of near-duplicate images for web search

  • Authors:
  • Jun Jie Foo;Justin Zobel;Ranjan Sinha;S. M. M. Tahaghoghi

  • Affiliations:
  • RMIT University, Melbourne, Victoria, Australia;RMIT University, Melbourne, Victoria, Australia;The University of Melbourne, Australia;RMIT University, Melbourne, Victoria, Australia

  • Venue:
  • Proceedings of the 6th ACM international conference on Image and video retrieval
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Among the vast numbers of images on the web are many duplicates and near-duplicates, that is, variants derived from the same original image. Such near-duplicates appear in many web image searches and may represent infringements of copyright or indicate the presence of redundancy. While methods for identifying near-duplicates have been investigated, there has been no analysis of the kinds of alterations that are common on the web or evaluation of whether real cases of near-duplication can in fact be identified. In this paper we use popular queries and a commercial image search service to collect images that we then manually analyse for instances of near-duplication. We show that such duplication is indeed significant, but that not all kinds of image alteration explored in previous literature are evident in web data. Removal of near-duplicates from a collection is impractical, but we propose that they be removed from sets of answers. We evaluate our technique for automatic identification of near duplicates during query evaluation and show that it has promise as an effective mechanism for management of near-duplication in practice.