Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Distinctive Image Features from Scale-Invariant Keypoints
International Journal of Computer Vision
STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
An efficient parts-based near-duplicate and sub-image retrieval system
Proceedings of the 12th annual ACM international conference on Multimedia
Scalable Recognition with a Vocabulary Tree
CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Local Graph Partitioning using PageRank Vectors
FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
Detecting near-duplicates for web crawling
Proceedings of the 16th international conference on World Wide Web
Scalable near identical image and shot detection
Proceedings of the 6th ACM international conference on Image and video retrieval
Multi-probe LSH: efficient indexing for high-dimensional similarity search
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Efficiently matching sets of features with random histograms
MM '08 Proceedings of the 16th ACM international conference on Multimedia
Descriptive visual words and visual phrases for image applications
MM '09 Proceedings of the 17th ACM international conference on Multimedia
Evaluation of GIST descriptors for web-scale image search
Proceedings of the ACM International Conference on Image and Video Retrieval
Vlfeat: an open and portable library of computer vision algorithms
Proceedings of the international conference on Multimedia
Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
Hi-index | 0.00 |
In this paper, we propose two techniques for near-duplicate image detection at high confidence and large scale. First, we show that entropy-based filtering eliminates ambiguous SIFT features that cause most of the false positives, and enables claiming near-duplicity with a single match of the retained high-quality features. Second, we show that graph cut can be used for query expansion with a duplicity graph computed offline to substantially improve search quality. Evaluation with web images show that when combined with sketch embedding [6], our methods achieve false positive rate orders of magnitude lower than the standard visual word approach. We demonstrate the proposed techniques with a large-scale image search engine which, using indexing data structure offline computed with a Hadoop cluster, is capable of serving more than 50 million web images with a single commodity server.