Online result cache invalidation for real-time web search

Authors:
Xiao Bai;Flavio P. Junqueira
Affiliations:
Yahoo! Research, Barcelona, Spain;Yahoo! Research, Barcelona, Spain
Venue:
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Year:
2012

Citing 19
Cited 3

Optimization for dynamic inverted index maintenance

SIGIR '90 Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval
Rank-preserving two-level caching for scalable search engines

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Refreshment policies for web content caches

Computer Networks: The International Journal of Computer and Telecommunications Networking
Predictive caching and prefetching of query results in search engines

WWW '03 Proceedings of the 12th international conference on World Wide Web
Optimizing result prefetching in web search engines with segmented indices

ACM Transactions on Internet Technology (TOIT)
In-place versus re-build versus re-merge: index maintenance strategies for text retrieval systems

ACSC '04 Proceedings of the 27th Australasian conference on Computer science - Volume 26
Three-level caching for efficient query processing in large Web search engines

WWW '05 Proceedings of the 14th international conference on World Wide Web
Hybrid index maintenance for growing text collections

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
The impact of caching on search engines

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Dynamic index pruning for effective caching

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Design trade-offs for search engine caching

ACM Transactions on the Web (TWEB)
Improved techniques for result caching in web search engines

Proceedings of the 18th international conference on World wide web
A hybrid cache and prefetch mechanism for scientific literature search engines

ICWE'07 Proceedings of the 7th international conference on Web engineering
A refreshing perspective of search engine caching

Proceedings of the 19th international conference on World wide web
Admission policies for caches of search engine results

SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Caching search engine results over incremental indices

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Caching for realtime search

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Timestamp-based result cache invalidation for web search engines

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
On caching search engine query results

Computer Communications

Cache refreshing for online social news feeds

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Strategies for setting time-to-live values in result caches

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Improving the efficiency of multi-site web search engines

Proceedings of the 7th ACM international conference on Web search and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Caches of results are critical components of modern Web search engines, since they enable lower response time to frequent queries and reduce the load to the search engine backend. Results in long-lived cache entries may become stale, however, as search engines continuously update their index to incorporate changes to the Web. Consequently, it is important to provide mechanisms that control the degree of staleness of cached results, ideally enabling the search engine to always return fresh results. In this paper, we present a new mechanism that identifies and invalidates query results that have become stale in the cache online. The basic idea is to evaluate at query time and against recent changes if cache hits have had their results have changed. For enhancing invalidation efficiency, the generation time of cached queries and their chronological order with respect to the latest index update are used to early prune unaffected queries. We evaluate the proposed approach using documents that change over time and query logs of the Yahoo! search engine. We show that the proposed approach ensures good query results (50% fewer stale results) and high invalidation accuracy (90% fewer unnecessary invalidations) compared to a baseline approach that makes invalidation decisions off-line. More importantly, the proposed approach induces less processing overhead, ensuring an average throughput 73% higher than that of the baseline approach.