Fast content-based retrieval from online photo sharing sites

Authors:
Gerald Schaefer;David Edmundson
Affiliations:
Department of Computer Science, Loughborough University, Loughborough, U.K.;Department of Computer Science, Loughborough University, Loughborough, U.K.
Venue:
AMT'12 Proceedings of the 8th international conference on Active Media Technology
Year:
2012

Citing 7
Cited 0

The JPEG still picture compression standard

Communications of the ACM - Special issue on digital multimedia systems
Color indexing

International Journal of Computer Vision
Content-Based Image Retrieval at the End of the Early Years

IEEE Transactions on Pattern Analysis and Machine Intelligence
Query by Image and Video Content: The QBIC System

Computer
Image retrieval: Ideas, influences, and trends of the new age

ACM Computing Surveys (CSUR)
Mining image databases by content

BNCOD'11 Proceedings of the 28th British national conference on Advances in databases
DC stream based JPEG compressed domain image retrieval

AMT'12 Proceedings of the 8th international conference on Active Media Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Literally billions of images have been uploaded to photo sharing sites since their inception, comprising a staggering wealth of visual information. However, effective tools for querying these collections are rare and keyword based. Since users rarely annotate their images, this approach is only of limited use. Content-based image retrieval (CBIR) extracts features directly from images and bases searches on these features. However, conventional CBIR approaches require a dedicated system that performs feature extraction during photo upload and a database system to store the features, and are hence not available to the average user. In this paper, we present a very fast content-based retrieval method that performs feature extraction on-the-fly during the retrieval process and thus can be employed client-side on images downloaded from photo sharing sites such as Flickr. Our approach is based on the fact that images uploaded to Flickr are stored in a JPEG format optimised to minimise disk space and bandwidth usage. In particular, we exploit the optimised Huffman compression tables, which are stored in the JPEG headers, as image descriptors. Since, in contrast to other approaches, we thus have to read only a fraction of the image file and similarity calculation is of low complexity, our approach is extremely fast as demonstrated by the bandwidth used to retrieve images from the Flickr photo sharing site. We also show that nevertheless retrieval performance is comparable to CBIR using colour histograms which is at the core of many CBIR systems.