Detection of near-duplicate images for web search

Authors:
Jun Jie Foo;Justin Zobel;Ranjan Sinha;S. M. M. Tahaghoghi
Affiliations:
RMIT University, Melbourne, Victoria, Australia;RMIT University, Melbourne, Victoria, Australia;The University of Melbourne, Australia;RMIT University, Melbourne, Victoria, Australia
Venue:
Proceedings of the 6th ACM international conference on Image and video retrieval
Year:
2007

Citing 17
Cited 9

Multi-scale sub-image search

MULTIMEDIA '99 Proceedings of the seventh ACM international conference on Multimedia (Part 2)
Content-Based Image Retrieval at the End of the Early Years

IEEE Transactions on Pattern Analysis and Machine Intelligence
Duplicate detection in consumer photography and news video

Proceedings of the tenth ACM international conference on Multimedia
Similarity Search in High Dimensions via Hashing

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Finding Near-Replicas of Documents and Servers on the Web

WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
Watermarking scheme evaluation tool

MSE '00 Proceedings of the 2000 International Conference on Microelectronic Systems Education
Generic image classification using visual knowledge on the web

MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
An efficient parts-based near-duplicate and sub-image retrieval system

Proceedings of the 12th annual ACM international conference on Multimedia
Detecting image near-duplicate by stochastic attributed relational graph matching with learning

Proceedings of the 12th annual ACM international conference on Multimedia
The SPIRIT collection: an overview of a large web collection

ACM SIGIR Forum
Enhanced Perceptual Distance Functions and Indexing for Image Replica Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Efficient Image Matching with Distributions of Local Invariant Features

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
Content-based image retrieval: approaches and trends of the new age

Proceedings of the 7th ACM SIGMM international workshop on Multimedia information retrieval
Pruning SIFT for scalable near-duplicate image matching

ADC '07 Proceedings of the eighteenth conference on Australasian database - Volume 63
PCA-SIFT: a more distinctive representation for local image descriptors

CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
A DWT-DFT composite watermarking scheme robust to both affine transform and JPEG compression

IEEE Transactions on Circuits and Systems for Video Technology

Clustering near-duplicate images in large collections

Proceedings of the international workshop on Workshop on multimedia information retrieval
Finding near-duplicate images on the web using fingerprints

MM '08 Proceedings of the 16th ACM international conference on Multimedia
Large scale image copy detection evaluation

MIR '08 Proceedings of the 1st ACM international conference on Multimedia information retrieval
Caching content-based queries for robust and efficient image retrieval

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Secure and robust SIFT

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Scaling content-based video copy detection to very large databases

Multimedia Tools and Applications
An axiomatic approach to measuring of information of sign-based image representations

Journal of Computer and Systems Sciences International
BASIL: effective near-duplicate image detection using gene sequence alignment

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Constraint-optimized keypoint inhibition/insertion attack: security threat to scale-space image feature extraction

Proceedings of the 20th ACM international conference on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

Among the vast numbers of images on the web are many duplicates and near-duplicates, that is, variants derived from the same original image. Such near-duplicates appear in many web image searches and may represent infringements of copyright or indicate the presence of redundancy. While methods for identifying near-duplicates have been investigated, there has been no analysis of the kinds of alterations that are common on the web or evaluation of whether real cases of near-duplication can in fact be identified. In this paper we use popular queries and a commercial image search service to collect images that we then manually analyse for instances of near-duplication. We show that such duplication is indeed significant, but that not all kinds of image alteration explored in previous literature are evident in web data. Removal of near-duplicates from a collection is impractical, but we propose that they be removed from sets of answers. We evaluate our technique for automatic identification of near duplicates during query evaluation and show that it has promise as an effective mechanism for management of near-duplication in practice.