Principles of database buffer management
ACM Transactions on Database Systems (TODS)
Amortized efficiency of list update and paging rules
Communications of the ACM
Buffer management in relational database systems
ACM Transactions on Database Systems (TODS)
Optimal algorithms for approximate clustering
STOC '88 Proceedings of the twentieth annual ACM symposium on Theory of computing
STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Randomized algorithms for metrical task systems
Theoretical Computer Science
Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Online computation and competitive analysis
Online computation and competitive analysis
Operating system support for database management
Communications of the ACM
Operating System Concepts
Similarity Search in High Dimensions via Hashing
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Uniform Service System with k Servers
LATIN '98 Proceedings of the Third Latin American Symposium on Theoretical Informatics
Predictive caching and prefetching of query results in search engines
WWW '03 Proceedings of the 12th international conference on World Wide Web
Better streaming algorithms for clustering problems
Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
Incremental Clustering and Dynamic Information Retrieval
SIAM Journal on Computing
A metric cache for similarity search
Proceedings of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval
Nearest-neighbor caching for content-match applications
Proceedings of the 18th international conference on World wide web
Nearest-neighbor caching for content-match applications
Proceedings of the 18th international conference on World wide web
Proceedings of the fourth ACM international conference on Web search and data mining
Similarity caching in large-scale image retrieval
Information Processing and Management: an International Journal
Cache-Based Query Processing for Search Engines
ACM Transactions on the Web (TWEB)
Hi-index | 0.00 |
We introduce the similarity caching problem, a variant of classical caching in which an algorithm can return an element from the cache that is similar, but not necessarily identical, to the query element. We are motivated by buffer management questions in approximate nearest-neighbor applications, especially in the context of caching targeted advertisements on the web. Formally, we assume the queries lie in a metric space, with distance function d(.,.). A query p is considered a cache hit if there is a point q in the cache that is sufficiently close to p, i.e., for a threshold radius r, we have d(p,q) ≤ r. The goal is then to minimize the number of cache misses, vis-à-vis the optimal algorithm. As with classical caching, we use the competitive ratio to measure the performance of different algorithms. While similarity caching is a strict generalization of classical caching, we show that unless the algorithm is allowed extra power (either in the size of the cache or the threshold r) over the optimal offline algorithm, the problem is intractable. We then proceed to quantify the hardness as a function of the complexity of the underlying metric space. We show that the problem becomes easier as we proceed from general metric spaces to those of bounded doubling dimension, and to Euclidean metrics. Finally, we investigate several extensions of the problem: dependence of the threshold r on the query and a smoother trade-off between the cache-miss cost and the query-query similarity.