Efficient and robust near-duplicate detection in large and growing image data-sets

  • Authors:
  • Thomas Pönitz;Julian Stöttinger

  • Affiliations:
  • CogVis Ltd., Vienna, Austria;Vienna Technical University, Vienna, Austria

  • Venue:
  • Proceedings of the international conference on Multimedia
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Due to the increasing flood of digital images and the overall increase of storage capacity, large scale image databases are common these days. This work deals with the problem of finding replicas in image databases containing more than 100000 images. A clustering algorithm is developed that has linear runtime and can be carried out in parallel. We observe that with increasing size of the database, the problem of decreasing discrimination between high frequency images arises. Features of images with natural repetitive texture become similar to other images and show up in most of the search results. This problem is addressed by developing an asymmetric Hamming distance measurement for bags of visual words. It allows better discrimination power in large databases, while being robust to image transformations such as rotation, cropping, or change of resolution and size.