Randomized algorithms
Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Online computation and competitive analysis
Online computation and competitive analysis
Min-wise independent permutations
Journal of Computer and System Sciences - 30th annual ACM symposium on theory of computing
Rank-preserving two-level caching for scalable search engines
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
Predictive caching and prefetching of query results in search engines
WWW '03 Proceedings of the 12th international conference on World Wide Web
IC3N '97 Proceedings of the 6th International Conference on Computer Communications and Networks
Efficient query evaluation using a two-level retrieval process
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Locality-sensitive hashing scheme based on p-stable distributions
SCG '04 Proceedings of the twentieth annual symposium on Computational geometry
Three-level caching for efficient query processing in large Web search engines
WWW '05 Proceedings of the 14th international conference on World Wide Web
ACM Transactions on Information Systems (TOIS)
Cost-aware WWW proxy caching algorithms
USITS'97 Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems
Optimizing result prefetching in web search engines with segmented indices
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Performance of compressed inverted list caching in search engines
Proceedings of the 17th international conference on World Wide Web
A metric cache for similarity search
Proceedings of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval
Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
On caching search engine query results
Computer Communications
Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Proceedings of the 19th international conference on World wide web
Self-taught hashing for fast similarity search
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Relevance-index size tradeoff in contextual advertising
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Learning website hierarchies for keyword enrichment in contextual advertising
Proceedings of the fourth ACM international conference on Web search and data mining
Theory and applications of b-bit minwise hashing
Communications of the ACM
Fast top-k retrieval for model based recommendation
Proceedings of the fifth ACM international conference on Web search and data mining
Similarity caching in large-scale image retrieval
Information Processing and Management: an International Journal
Cache-Based Query Processing for Search Engines
ACM Transactions on the Web (TWEB)
Semi-supervised spectral hashing for fast similarity search
Neurocomputing
Rank hash similarity for fast similarity search
Information Processing and Management: an International Journal
Semantic contextual advertising based on the open directory project
ACM Transactions on the Web (TWEB)
b-bit minwise hashing in practice
Proceedings of the 5th Asia-Pacific Symposium on Internetware
Hi-index | 0.02 |
Motivated by contextual advertising systems and other web applications involving efficiency-accuracy tradeoffs, we study similarity caching. Here, a cache hit is said to occur if the requested item is similar but not necessarily equal to some cached item. We study two objectives that dictate the efficiency-accuracy tradeoff and provide our caching policies for these objectives. By conducting extensive experiments on real data we show similarity caching can significantly improve the efficiency of contextual advertising systems, with minimal impact on accuracy. Inspired by the above, we propose a simple generative model that embodies two fundamental characteristics of page requests arriving to advertising systems, namely, long-range dependences and similarities. We provide theoretical bounds on the gains of similarity caching in this model and demonstrate these gains empirically by fitting the actual data to the model.