Similarity caching in large-scale image retrieval

Authors:
Fabrizio Falchi;Claudio Lucchese;Salvatore Orlando;Raffaele Perego;Fausto Rabitti
Affiliations:
I.S.T.I. "A. Faedo"- C.N.R., Via G. Moruzzi 1, 56124 Pisa, Italy;I.S.T.I. "A. Faedo"- C.N.R., Via G. Moruzzi 1, 56124 Pisa, Italy;Universitá Ca' Foscari Venezia, DAIS, Via Torino, 155 - 30172 Venezia, Italy;I.S.T.I. "A. Faedo"- C.N.R., Via G. Moruzzi 1, 56124 Pisa, Italy;I.S.T.I. "A. Faedo"- C.N.R., Via G. Moruzzi 1, 56124 Pisa, Italy
Venue:
Information Processing and Management: an International Journal
Year:
2012

Citing 33
Cited 1

An optimal algorithm for approximate nearest neighbor searching fixed dimensions

Journal of the ACM (JACM)
Indexing large metric spaces for similarity search queries

ACM Transactions on Database Systems (TODS)
Analysis of a very large web search engine query log

ACM SIGIR Forum
Searching in metric spaces

ACM Computing Surveys (CSUR)
Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases

ACM Computing Surveys (CSUR)
VQ-index: an index structure for similarity searching in multimedia databases

Proceedings of the tenth ACM international conference on Multimedia
Approximate Nearest Neighbor Searching in Multimedia Databases

Proceedings of the 17th International Conference on Data Engineering
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Predictive caching and prefetching of query results in search engines

WWW '03 Proceedings of the 12th international conference on World Wide Web
Introduction to MPEG-7: Multimedia Content Description Interface

Introduction to MPEG-7: Multimedia Content Description Interface
Pivot selection techniques for proximity searching in metric spaces

Pattern Recognition Letters
Index-driven similarity search in metric spaces (Survey Article)

ACM Transactions on Database Systems (TODS)
Similarity Search: The Metric Space Approach (Advances in Database Systems)

Similarity Search: The Metric Space Approach (Advances in Database Systems)
Boosting the performance of Web search engines: Caching and prefetching query results by exploiting historical usage data

ACM Transactions on Information Systems (TOIS)
Information re-retrieval: repeat queries in Yahoo's logs

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Predictive user click models based on click-through history

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Image retrieval: Ideas, influences, and trends of the new age

ACM Computing Surveys (CSUR)
Scalability comparison of Peer-to-Peer similarity search structures

Future Generation Computer Systems
Effective Proximity Retrieval by Ordering Permutations

IEEE Transactions on Pattern Analysis and Machine Intelligence
Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs

Proceedings of the 17th ACM conference on Information and knowledge management
A metric cache for similarity search

Proceedings of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval
Approximate similarity search in metric spaces using inverted files

Proceedings of the 3rd international conference on Scalable information systems
Caching content-based queries for robust and efficient image retrieval

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Nearest-neighbor caching for content-match applications

Proceedings of the 18th international conference on World wide web
Similarity caching

Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
On Index-Free Similarity Search in Metric Spaces

DEXA '09 Proceedings of the 20th International Conference on Database and Expert Systems Applications
Metric Index: An Efficient and Scalable Solution for Similarity Search

SISAP '09 Proceedings of the 2009 Second International Workshop on Similarity Search and Applications
Tuning the capacity of search engines: Load-driven routing and incremental caching to reduce and balance the load

ACM Transactions on Information Systems (TOIS)
Building a web-scale image similarity search system

Multimedia Tools and Applications
Mining Query Logs: Turning Search Usage Data into Knowledge

Foundations and Trends in Information Retrieval
An approach to content-based image retrieval based on the Lucene search engine library

ECDL'10 Proceedings of the 14th European conference on Research and advanced technology for digital libraries
On caching search engine query results

Computer Communications

Approximate distributed metric-space search

Proceedings of the 9th workshop on Large-scale and distributed informational retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Feature-rich data, such as audio-video recordings, digital images, and results of scientific experiments, nowadays constitute the largest fraction of the massive data sets produced daily in the e-society. Content-based similarity search systems working on such data collections are rapidly growing in importance. Unfortunately, similarity search is in general very expensive and hardly scalable. In this paper we study the case of content-based image retrieval (CBIR) systems, and focus on the problem of increasing the throughput of a large-scale CBIR system that indexes a very large collection of digital images. By analyzing the query log of a real CBIR system available on the Web, we characterize the behavior of users who experience a novel search paradigm, where content-based similarity queries and text-based ones can easily be interleaved. We show that locality and self-similarity is present even in the stream of queries submitted to such a CBIR system. According to these results, we propose an effective way to exploit this locality, by means of a similarity caching system, which stores the results of recently/frequently submitted queries and associated results. Unlike traditional caching, the proposed cache can manage not only exact hits, but also approximate ones that are solved by similarity with respect to the result sets of past queries present in the cache. We evaluate extensively the proposed solution by using the real query stream recorded in the log and a collection of 100 millions of digital photographs. The high hit ratios and small average approximation error figures obtained demonstrate the effectiveness of the approach.