Clustering near-duplicate images in large collections

  • Authors:
  • Jun Jie Foo;Justin Zobel;Ranjan Sinha

  • Affiliations:
  • RMIT University, Melbourne, Australia;RMIT University, Melbourne, Australia;University of Melbourne, Melbourne, Australia

  • Venue:
  • Proceedings of the international workshop on Workshop on multimedia information retrieval
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Near-duplicate images introduce problems of redundancy and copyright infringement in large image collections. The problem is acute on the web, where appropriation of images without acknowledgment of source is prevalent. In this paper, we present an effective clustering approach for near-duplicate images, using a combination of techniques from invariant image local descriptors and an adaptation of near-duplicate text-document clustering techniques; we extend our earlier approach of near-duplicate image pairwise identification for this clustering approach. We demonstrate that our clustering approach is highly effective for collections of up to a few hundred thousand images. We also show --- via experimentation with real examples --- that ourapproach presents a viable solution for clustering near-duplicate images on the Web.