High-confidence near-duplicate image detection

Authors:
Wei Dong;Zhe Wang;Moses Charikar;Kai Li
Affiliations:
Independent Researcher, Ann Arbor MI;Princeton University, Princeton, NJ;Princeton University, Princeton, NJ;Princeton University, Princeton, NJ
Venue:
Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
Year:
2012

Citing 13
Cited 1

Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
An efficient parts-based near-duplicate and sub-image retrieval system

Proceedings of the 12th annual ACM international conference on Multimedia
Scalable Recognition with a Vocabulary Tree

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Local Graph Partitioning using PageRank Vectors

FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
Detecting near-duplicates for web crawling

Proceedings of the 16th international conference on World Wide Web
Scalable near identical image and shot detection

Proceedings of the 6th ACM international conference on Image and video retrieval
Multi-probe LSH: efficient indexing for high-dimensional similarity search

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Efficiently matching sets of features with random histograms

MM '08 Proceedings of the 16th ACM international conference on Multimedia
Descriptive visual words and visual phrases for image applications

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Evaluation of GIST descriptors for web-scale image search

Proceedings of the ACM International Conference on Image and Video Retrieval
Vlfeat: an open and portable library of computer vision algorithms

Proceedings of the international conference on Multimedia

Twitter's visual pulse

Proceedings of the 3rd ACM conference on International conference on multimedia retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose two techniques for near-duplicate image detection at high confidence and large scale. First, we show that entropy-based filtering eliminates ambiguous SIFT features that cause most of the false positives, and enables claiming near-duplicity with a single match of the retained high-quality features. Second, we show that graph cut can be used for query expansion with a duplicity graph computed offline to substantially improve search quality. Evaluation with web images show that when combined with sketch embedding [6], our methods achieve false positive rate orders of magnitude lower than the standard visual word approach. We demonstrate the proposed techniques with a large-scale image search engine which, using indexing data structure offline computed with a Hadoop cluster, is capable of serving more than 50 million web images with a single commodity server.