An optimal algorithm for approximate nearest neighbor searching fixed dimensions
Journal of the ACM (JACM)
Indexing large metric spaces for similarity search queries
ACM Transactions on Database Systems (TODS)
Analysis of a very large web search engine query log
ACM SIGIR Forum
ACM Computing Surveys (CSUR)
ACM Computing Surveys (CSUR)
VQ-index: an index structure for similarity searching in multimedia databases
Proceedings of the tenth ACM international conference on Multimedia
Approximate Nearest Neighbor Searching in Multimedia Databases
Proceedings of the 17th International Conference on Data Engineering
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Predictive caching and prefetching of query results in search engines
WWW '03 Proceedings of the 12th international conference on World Wide Web
Introduction to MPEG-7: Multimedia Content Description Interface
Introduction to MPEG-7: Multimedia Content Description Interface
Pivot selection techniques for proximity searching in metric spaces
Pattern Recognition Letters
Index-driven similarity search in metric spaces (Survey Article)
ACM Transactions on Database Systems (TODS)
Similarity Search: The Metric Space Approach (Advances in Database Systems)
Similarity Search: The Metric Space Approach (Advances in Database Systems)
ACM Transactions on Information Systems (TOIS)
Information re-retrieval: repeat queries in Yahoo's logs
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Predictive user click models based on click-through history
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Image retrieval: Ideas, influences, and trends of the new age
ACM Computing Surveys (CSUR)
Scalability comparison of Peer-to-Peer similarity search structures
Future Generation Computer Systems
Effective Proximity Retrieval by Ordering Permutations
IEEE Transactions on Pattern Analysis and Machine Intelligence
Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs
Proceedings of the 17th ACM conference on Information and knowledge management
A metric cache for similarity search
Proceedings of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval
Approximate similarity search in metric spaces using inverted files
Proceedings of the 3rd international conference on Scalable information systems
Caching content-based queries for robust and efficient image retrieval
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Nearest-neighbor caching for content-match applications
Proceedings of the 18th international conference on World wide web
Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
On Index-Free Similarity Search in Metric Spaces
DEXA '09 Proceedings of the 20th International Conference on Database and Expert Systems Applications
Metric Index: An Efficient and Scalable Solution for Similarity Search
SISAP '09 Proceedings of the 2009 Second International Workshop on Similarity Search and Applications
ACM Transactions on Information Systems (TOIS)
Building a web-scale image similarity search system
Multimedia Tools and Applications
Mining Query Logs: Turning Search Usage Data into Knowledge
Foundations and Trends in Information Retrieval
An approach to content-based image retrieval based on the Lucene search engine library
ECDL'10 Proceedings of the 14th European conference on Research and advanced technology for digital libraries
On caching search engine query results
Computer Communications
Approximate distributed metric-space search
Proceedings of the 9th workshop on Large-scale and distributed informational retrieval
Hi-index | 0.00 |
Feature-rich data, such as audio-video recordings, digital images, and results of scientific experiments, nowadays constitute the largest fraction of the massive data sets produced daily in the e-society. Content-based similarity search systems working on such data collections are rapidly growing in importance. Unfortunately, similarity search is in general very expensive and hardly scalable. In this paper we study the case of content-based image retrieval (CBIR) systems, and focus on the problem of increasing the throughput of a large-scale CBIR system that indexes a very large collection of digital images. By analyzing the query log of a real CBIR system available on the Web, we characterize the behavior of users who experience a novel search paradigm, where content-based similarity queries and text-based ones can easily be interleaved. We show that locality and self-similarity is present even in the stream of queries submitted to such a CBIR system. According to these results, we propose an effective way to exploit this locality, by means of a similarity caching system, which stores the results of recently/frequently submitted queries and associated results. Unlike traditional caching, the proposed cache can manage not only exact hits, but also approximate ones that are solved by similarity with respect to the result sets of past queries present in the cache. We evaluate extensively the proposed solution by using the real query stream recorded in the log and a collection of 100 millions of digital photographs. The high hit ratios and small average approximation error figures obtained demonstrate the effectiveness of the approach.