Nearest-neighbor caching for content-match applications

Authors:
Sandeep Pandey;Andrei Broder;Flavio Chierichetti;Vanja Josifovski;Ravi Kumar;Sergei Vassilvitskii
Affiliations:
Yahoo! Research, Santa Clara, CA, USA;Yahoo! Research, Santa Clara, CA, USA;Yahoo! Research, Santa Clara, CA, USA;Yahoo! Research, Santa Clara, CA, USA;Yahoo! Research, Santa Clara, CA, USA;Yahoo! Research, Santa Clara, CA, USA
Venue:
Proceedings of the 18th international conference on World wide web
Year:
2009

Citing 18
Cited 14

Randomized algorithms

Randomized algorithms
Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Online computation and competitive analysis

Online computation and competitive analysis
Min-wise independent permutations

Journal of Computer and System Sciences - 30th annual ACM symposium on theory of computing
Rank-preserving two-level caching for scalable search engines

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Predictive caching and prefetching of query results in search engines

WWW '03 Proceedings of the 12th international conference on World Wide Web
Static caching in Web servers

IC3N '97 Proceedings of the 6th International Conference on Computer Communications and Networks
Efficient query evaluation using a two-level retrieval process

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Locality-sensitive hashing scheme based on p-stable distributions

SCG '04 Proceedings of the twentieth annual symposium on Computational geometry
Three-level caching for efficient query processing in large Web search engines

WWW '05 Proceedings of the 14th international conference on World Wide Web
Boosting the performance of Web search engines: Caching and prefetching query results by exploiting historical usage data

ACM Transactions on Information Systems (TOIS)
Cost-aware WWW proxy caching algorithms

USITS'97 Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems
Optimizing result prefetching in web search engines with segmented indices

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Performance of compressed inverted list caching in search engines

Proceedings of the 17th international conference on World Wide Web
A metric cache for similarity search

Proceedings of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval
Similarity caching

Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
On caching search engine query results

Computer Communications

Similarity caching

Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
b-Bit minwise hashing

Proceedings of the 19th international conference on World wide web
Self-taught hashing for fast similarity search

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Relevance-index size tradeoff in contextual advertising

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Finding the Jaccard median

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Learning website hierarchies for keyword enrichment in contextual advertising

Proceedings of the fourth ACM international conference on Web search and data mining
Theory and applications of b-bit minwise hashing

Communications of the ACM
Fast top-k retrieval for model based recommendation

Proceedings of the fifth ACM international conference on Web search and data mining
Similarity caching in large-scale image retrieval

Information Processing and Management: an International Journal
Cache-Based Query Processing for Search Engines

ACM Transactions on the Web (TWEB)
Semi-supervised spectral hashing for fast similarity search

Neurocomputing
Rank hash similarity for fast similarity search

Information Processing and Management: an International Journal
Semantic contextual advertising based on the open directory project

ACM Transactions on the Web (TWEB)
b-bit minwise hashing in practice

Proceedings of the 5th Asia-Pacific Symposium on Internetware

Quantified Score

Hi-index	0.02

Visualization

Abstract

Motivated by contextual advertising systems and other web applications involving efficiency-accuracy tradeoffs, we study similarity caching. Here, a cache hit is said to occur if the requested item is similar but not necessarily equal to some cached item. We study two objectives that dictate the efficiency-accuracy tradeoff and provide our caching policies for these objectives. By conducting extensive experiments on real data we show similarity caching can significantly improve the efficiency of contextual advertising systems, with minimal impact on accuracy. Inspired by the above, we propose a simple generative model that embodies two fundamental characteristics of page requests arriving to advertising systems, namely, long-range dependences and similarities. We provide theoretical bounds on the gains of similarity caching in this model and demonstrate these gains empirically by fitting the actual data to the model.