Clustering near-duplicate images in large collections

Authors:
Jun Jie Foo;Justin Zobel;Ranjan Sinha
Affiliations:
RMIT University, Melbourne, Australia;RMIT University, Melbourne, Australia;University of Melbourne, Melbourne, Australia
Venue:
Proceedings of the international workshop on Workshop on multimedia information retrieval
Year:
2007

Citing 22
Cited 4

Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Syntactic clustering of the Web

Selected papers from the sixth international conference on World Wide Web
Duplicate detection in consumer photography and news video

Proceedings of the tenth ACM international conference on Multimedia
Similarity Search in High Dimensions via Hashing

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Finding Near-Replicas of Documents and Servers on the Web

WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
Video Google: A Text Retrieval Approach to Object Matching in Videos

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Scale & Affine Invariant Interest Point Detectors

International Journal of Computer Vision
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
An efficient parts-based near-duplicate and sub-image retrieval system

Proceedings of the 12th annual ACM international conference on Multimedia
Detecting image near-duplicate by stochastic attributed relational graph matching with learning

Proceedings of the 12th annual ACM international conference on Multimedia
The SPIRIT collection: an overview of a large web collection

ACM SIGIR Forum
Enhanced Perceptual Distance Functions and Indexing for Image Replica Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
LSH forest: self-tuning indexes for similarity search

WWW '05 Proceedings of the 14th international conference on World Wide Web
Efficient Image Matching with Distributions of Local Invariant Features

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
Inverted files for text search engines

ACM Computing Surveys (CSUR)
Pruning SIFT for scalable near-duplicate image matching

ADC '07 Proceedings of the eighteenth conference on Australasian database - Volume 63
Scalable near identical image and shot detection

Proceedings of the 6th ACM international conference on Image and video retrieval
Detection of near-duplicate images for web search

Proceedings of the 6th ACM international conference on Image and video retrieval
PCA-SIFT: a more distinctive representation for local image descriptors

CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
Discovery of image versions in large collections

MMM'07 Proceedings of the 13th International conference on Multimedia Modeling - Volume Part II
SURF: speeded up robust features

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part I
CLUE: cluster-based retrieval of images by unsupervised learning

IEEE Transactions on Image Processing

Automatic discovery of image families: global vs. local features

ICIP'09 Proceedings of the 16th IEEE international conference on Image processing
BASIL: effective near-duplicate image detection using gene sequence alignment

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Making a scene: alignment of complete sets of clips based on pairwise audio match

Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
A kernel-based framework for image collection exploration

Journal of Visual Languages and Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Near-duplicate images introduce problems of redundancy and copyright infringement in large image collections. The problem is acute on the web, where appropriation of images without acknowledgment of source is prevalent. In this paper, we present an effective clustering approach for near-duplicate images, using a combination of techniques from invariant image local descriptors and an adaptation of near-duplicate text-document clustering techniques; we extend our earlier approach of near-duplicate image pairwise identification for this clustering approach. We demonstrate that our clustering approach is highly effective for collections of up to a few hundred thousand images. We also show --- via experimentation with real examples --- that ourapproach presents a viable solution for clustering near-duplicate images on the Web.