Caching search engine results over incremental indices

Authors:
Roi Blanco;Edward Bortnikov;Flavio Junqueira;Ronny Lempel;Luca Telloli;Hugo Zaragoza
Affiliations:
Yahoo! Research, Barcelona, Spain;Yahoo! Labs, Haifa, Israel;Yahoo! Research, Barcelona, Spain;Yahoo! Labs, Haifa, Israel;Barcelona Supercomputing Center, Barcelona, Spain;Yahoo! Research, Barcelona, Spain
Venue:
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Year:
2010

Citing 19
Cited 16

Syntactic clustering of the Web

Selected papers from the sixth international conference on World Wide Web
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Building a distributed full-text index for the Web

Proceedings of the 10th international conference on World Wide Web
Searching the Web

ACM Transactions on Internet Technology (TOIT)
Rank-preserving two-level caching for scalable search engines

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Modern Information Retrieval

Modern Information Retrieval
Mining the Web: Discovering Knowledge from HyperText Data

Mining the Web: Discovering Knowledge from HyperText Data
The Evolution of the Web and Implications for an Incremental Crawler

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Predictive caching and prefetching of query results in search engines

WWW '03 Proceedings of the 12th international conference on World Wide Web
Competitive caching of query results in search engines

Theoretical Computer Science - Special issue: Online algorithms in memoriam, Steve Seiden
Three-level caching for efficient query processing in large Web search engines

WWW '05 Proceedings of the 14th international conference on World Wide Web
Boosting the performance of Web search engines: Caching and prefetching query results by exploiting historical usage data

ACM Transactions on Information Systems (TOIS)
The discoverability of the web

Proceedings of the 16th international conference on World Wide Web
High performance index build algorithms for intranet search engines

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
ResIn: a combination of results caching and index pruning for high-performance web search engines
Design trade-offs for search engine caching

ACM Transactions on the Web (TWEB)
Improved techniques for result caching in web search engines

Proceedings of the 18th international conference on World wide web
Admission policies for caches of search engine results

SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
On caching search engine query results

Computer Communications

Cost-Aware Strategies for Query Result Caching in Web Search Engines

ACM Transactions on the Web (TWEB)
Timestamp-based cache invalidation for search engines

Proceedings of the 20th international conference companion on World wide web
Caching for realtime search

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Timestamp-based result cache invalidation for web search engines

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Search result caching in peer-to-peer information retrieval networks

IRFC'11 Proceedings of the Second international conference on Multidisciplinary information retrieval facility
Assigning documents to master sites in distributed search

Proceedings of the 20th ACM international conference on Information and knowledge management
Query expansion methods and performance evaluation for reusing linking open data of the European public procurement notices

CAEPIA'11 Proceedings of the 14th international conference on Advances in artificial intelligence: spanish association for artificial intelligence
Adaptive time-to-live strategies for query result caching in web search engines

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
LePrEF: Learn to precompute evidence fusion for efficient query evaluation

Journal of the American Society for Information Science and Technology
Prefetching query results and its impact on search engines

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Online result cache invalidation for real-time web search

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Cache-Based Query Processing for Search Engines

ACM Transactions on the Web (TWEB)
Strategies for setting time-to-live values in result caches

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Top-k publish-subscribe for social annotation of news

Proceedings of the VLDB Endowment
Second Chance: A Hybrid Approach for Dynamic Result Caching and Prefetching in Search Engines

ACM Transactions on the Web (TWEB)
Improving the efficiency of multi-site web search engines

Proceedings of the 7th ACM international conference on Web search and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

A Web search engine must update its index periodically to incorporate changes to the Web. We argue in this paper that index updates fundamentally impact the design of search engine result caches, a performance-critical component of modern search engines. Index updates lead to the problem of cache invalidation: invalidating cached entries of queries whose results have changed. Naive approaches, such as flushing the entire cache upon every index update, lead to poor performance and in fact, render caching futile when the frequency of updates is high. Solving the invalidation problem efficiently corresponds to predicting accurately which queries will produce different results if re-evaluated, given the actual changes to the index. To obtain this property, we propose a framework for developing invalidation predictors and define metrics to evaluate invalidation schemes. We describe concrete predictors using this framework and compare them against a baseline that uses a cache invalidation scheme based on time-to-live (TTL). Evaluation over Wikipedia documents using a query log from the Yahoo! search engine shows that selective invalidation of cached search results can lower the number of unnecessary query evaluations by as much as 30% compared to a baseline scheme, while returning results of similar freshness. In general, our predictors enable fewer unnecessary invalidations and fewer stale results compared to a TTL-only scheme for similar freshness of results.