Caching search engine results over incremental indices

  • Authors:
  • Roi Blanco;Edward Bortnikov;Flavio Junqueira;Ronny Lempel;Luca Telloli;Hugo Zaragoza

  • Affiliations:
  • Yahoo! Research, Barcelona, Spain;Yahoo! Labs, Haifa, Israel;Yahoo! Research, Barcelona, Spain;Yahoo! Labs, Haifa, Israel;Barcelona Supercomputing Center, Barcelona, Spain;Yahoo! Research, Barcelona, Spain

  • Venue:
  • Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

A Web search engine must update its index periodically to incorporate changes to the Web. We argue in this paper that index updates fundamentally impact the design of search engine result caches, a performance-critical component of modern search engines. Index updates lead to the problem of cache invalidation: invalidating cached entries of queries whose results have changed. Naive approaches, such as flushing the entire cache upon every index update, lead to poor performance and in fact, render caching futile when the frequency of updates is high. Solving the invalidation problem efficiently corresponds to predicting accurately which queries will produce different results if re-evaluated, given the actual changes to the index. To obtain this property, we propose a framework for developing invalidation predictors and define metrics to evaluate invalidation schemes. We describe concrete predictors using this framework and compare them against a baseline that uses a cache invalidation scheme based on time-to-live (TTL). Evaluation over Wikipedia documents using a query log from the Yahoo! search engine shows that selective invalidation of cached search results can lower the number of unnecessary query evaluations by as much as 30% compared to a baseline scheme, while returning results of similar freshness. In general, our predictors enable fewer unnecessary invalidations and fewer stale results compared to a TTL-only scheme for similar freshness of results.