Discovery of image versions in large collections

Authors:
Jun Jie Foo;Ranjan Sinha;Justin Zobel
Affiliations:
School of Computer Science & IT, RMIT University, Melbourne, Australia;School of Computer Science & IT, RMIT University, Melbourne, Australia;School of Computer Science & IT, RMIT University, Melbourne, Australia
Venue:
MMM'07 Proceedings of the 13th International conference on Multimedia Modeling - Volume Part II
Year:
2007

Citing 19
Cited 4

Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Syntactic clustering of the Web

Selected papers from the sixth international conference on World Wide Web
Multi-scale sub-image search

MULTIMEDIA '99 Proceedings of the seventh ACM international conference on Multimedia (Part 2)
Content-Based Image Retrieval at the End of the Early Years

IEEE Transactions on Pattern Analysis and Machine Intelligence
Duplicate detection in consumer photography and news video

Proceedings of the tenth ACM international conference on Multimedia
Similarity Search in High Dimensions via Hashing

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Finding Near-Replicas of Documents and Servers on the Web

WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
Content based sub-image retrieval via hierarchical tree matching

MMDB '03 Proceedings of the 1st ACM international workshop on Multimedia databases
Scale & Affine Invariant Interest Point Detectors

International Journal of Computer Vision
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
An efficient parts-based near-duplicate and sub-image retrieval system

Proceedings of the 12th annual ACM international conference on Multimedia
Detecting image near-duplicate by stochastic attributed relational graph matching with learning

Proceedings of the 12th annual ACM international conference on Multimedia
The SPIRIT collection: an overview of a large web collection

ACM SIGIR Forum
Enhanced Perceptual Distance Functions and Indexing for Image Replica Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Inverted files for text search engines

ACM Computing Surveys (CSUR)
Pruning SIFT for scalable near-duplicate image matching

ADC '07 Proceedings of the eighteenth conference on Australasian database - Volume 63
An image watermarking algorithm robust to geometric distortion

IWDW'02 Proceedings of the 1st international conference on Digital watermarking
PCA-SIFT: a more distinctive representation for local image descriptors

CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
A DWT-DFT composite watermarking scheme robust to both affine transform and JPEG compression

IEEE Transactions on Circuits and Systems for Video Technology

Clustering near-duplicate images in large collections

Proceedings of the international workshop on Workshop on multimedia information retrieval
Large scale image copy detection evaluation

MIR '08 Proceedings of the 1st ACM international conference on Multimedia information retrieval
Using redundant bit vectors for near-duplicate image detection

DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
Efficient incremental near duplicate detection based on locality sensitive hashing

DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Image collections may contain multiple copies, versions, and fragments of the same image. Storage or retrieval of such duplicates and near-duplicates may be unnecessary and, in the context of collections derived from the web, their presence may represent infringements of copyright. However, identifying image versions is a challenging problem, as they can be subject to a wide range of digital alterations, and is potentially costly as the number of image pairs to be considered is quadratic in collection size. In this paper, we propose a method for finding the pairs of near-duplicates based on manipulation of an image index. Our approach is an adaptation of a robust object recognition technique and a near-duplicate document detection algorithm to this application domain. We show that this method requires only moderate computing resources, and is highly effective at identifying pairs of near-duplicates.