The cache memory book
Principles of Optimal Page Replacement
Journal of the ACM (JACM)
Predictive caching and prefetching of query results in search engines
WWW '03 Proceedings of the 12th international conference on World Wide Web
ACM Transactions on Information Systems (TOIS)
Improved techniques for result caching in web search engines
Proceedings of the 18th international conference on World wide web
Towards recency ranking in web search
Proceedings of the third ACM international conference on Web search and data mining
A refreshing perspective of search engine caching
Proceedings of the 19th international conference on World wide web
Modern Information Retrieval
Caching search engine results over incremental indices
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
On caching search engine query results
Computer Communications
Adaptive time-to-live strategies for query result caching in web search engines
ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Prefetching query results and its impact on search engines
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Online result cache invalidation for real-time web search
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Strategies for setting time-to-live values in result caches
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Second Chance: A Hybrid Approach for Dynamic Result Caching and Prefetching in Search Engines
ACM Transactions on the Web (TWEB)
Hi-index | 0.00 |
Modern search engines feature real-time indices, which incorporate changes to content within seconds. As search engines also cache search results for reducing user latency and back-end load, without careful real-time management of search results caches, the engine might return stale search results to users despite the efforts invested in keeping the underlying index up to date. A recent paper proposed an architectural component called CIP - the cache invalidation predictor. CIPs invalidate supposedly stale cache entries upon index modifications. Initial evaluation showed the ability to keep the performance benefits of caching without sacrificing much the freshness of search results returned to users. However, it was conducted on a synthetic workload in a simplified setting, using many assumptions. We propose new CIP heuristics, and evaluate them in an authentic environment - on the real evolving corpus and query stream of a large commercial news search engine. Our CIPs operate in conjunction with realistic cache settings, and we use standard metrics for evaluating cache performance. We show that a classical cache replacement policy, LRU, completely fails to guarantee freshness over time, whereas our CIPs serve 97% of the queries with fresh results. Our policies incur a negligible impact on the baseline's cache hit rate, in contrast with traditional age-based invalidation, which must severely reduce the cache performance in order to achieve the same freshness. We demonstrate that the computational overhead of our algorithms is minor, and that they even allow reducing the cache's memory footprint.