A refreshing perspective of search engine caching

Authors:
Berkant Barla Cambazoglu;Flavio P. Junqueira;Vassilis Plachouras;Scott Banachowski;Baoqiu Cui;Swee Lim;Bill Bridge
Affiliations:
Yahoo! Research, Barcelona, Spain;Yahoo! Research, Barcelona, Spain;Athens University of Economics and Business, Athens, Greece;Yahoo! Search, Sunnyvale, CA, USA;Yahoo! Search, Sunnyvale, CA, USA;Yahoo! Search, Sunnyvale, CA, USA;Oracle Corporation, Redwood Shores, CA, USA
Venue:
Proceedings of the 19th international conference on World wide web
Year:
2010

Citing 27
Cited 25

Principles of database buffer management

ACM Transactions on Database Systems (TODS)
Data caching issues in an information retrieval system

ACM Transactions on Database Systems (TODS)
Data cache management using frequency-based replacement

SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Caching and database scaling in distributed shared-nothing information retrieval systems

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Caching strategies to improve disk system performance

Computer
On the reuse of past optimal queries

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Interaction of query evaluation and buffer management for information retrieval

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Principles of Optimal Page Replacement

Journal of the ACM (JACM)
Virtual Memory

ACM Computing Surveys (CSUR)
Cache Memories

ACM Computing Surveys (CSUR)
Rank-preserving two-level caching for scalable search engines

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Lessons from Giant-Scale Services

IEEE Internet Computing
LRFU: A Spectrum of Policies that Subsumes the Least Recently Used and Least Frequently Used Policies

IEEE Transactions on Computers
Caching on the World Wide Web

IEEE Transactions on Knowledge and Data Engineering
Predictive caching and prefetching of query results in search engines

WWW '03 Proceedings of the 12th international conference on World Wide Web
A survey of Web cache replacement strategies

ACM Computing Surveys (CSUR)
Outperforming LRU with an Adaptive Replacement Cache Algorithm

Computer
Web Caching And Its Applications (Kluwer International Series in Engineering and Computer Science)

Web Caching And Its Applications (Kluwer International Series in Engineering and Computer Science)
Three-level caching for efficient query processing in large Web search engines

WWW '05 Proceedings of the 14th international conference on World Wide Web
Boosting the performance of Web search engines: Caching and prefetching query results by exploiting historical usage data

ACM Transactions on Information Systems (TOIS)
Exploring the bounds of web latency reduction from caching and prefetching

USITS'97 Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems
Cost-aware WWW proxy caching algorithms

USITS'97 Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems
The impact of caching on search engines

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Static query result caching revisited

Proceedings of the 17th international conference on World Wide Web
ResIn: a combination of results caching and index pruning for high-performance web search engines
Improved techniques for result caching in web search engines

Proceedings of the 18th international conference on World wide web
On caching search engine query results

Computer Communications

Query forwarding in geographically distributed search engines

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Output URL bidding

Proceedings of the VLDB Endowment
Batch query processing for web search engines

Proceedings of the fourth ACM international conference on Web search and data mining
Document assignment in multi-site search engines

Proceedings of the fourth ACM international conference on Web search and data mining
Cost-Aware Strategies for Query Result Caching in Web Search Engines

ACM Transactions on the Web (TWEB)
Timestamp-based cache invalidation for search engines

Proceedings of the 20th international conference companion on World wide web
Caching for realtime search

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Posting list intersection on multicore architectures

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Timestamp-based result cache invalidation for web search engines

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Energy-price-driven query processing in multi-center web search engines

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Replicated partitioning for undirected hypergraphs

Journal of Parallel and Distributed Computing
Adaptive time-to-live strategies for query result caching in web search engines

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
A five-level static cache architecture for web search engines

Information Processing and Management: an International Journal
Prefetching query results and its impact on search engines

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Online result cache invalidation for real-time web search

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Cache-Based Query Processing for Search Engines

ACM Transactions on the Web (TWEB)
Document replication strategies for geographically distributed web search engines

Information Processing and Management: an International Journal
Materialization of web data sources

Search Computing
Adaptive parallelism for web search

Proceedings of the 8th ACM European Conference on Computer Systems
Rank-energy selective query forwarding for distributed search systems

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Cache refreshing for online social news feeds

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Strategies for setting time-to-live values in result caches

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
A term-based inverted index partitioning model for efficient distributed query processing

ACM Transactions on the Web (TWEB)
Second Chance: A Hybrid Approach for Dynamic Result Caching and Prefetching in Search Engines

ACM Transactions on the Web (TWEB)
Improving the efficiency of multi-site web search engines

Proceedings of the 7th ACM international conference on Web search and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Commercial Web search engines have to process user queries over huge Web indexes under tight latency constraints. In practice, to achieve low latency, large result caches are employed and a portion of the query traffic is served using previously computed results. Moreover, search engines need to update their indexes frequently to incorporate changes to the Web. After every index update, however, the content of cache entries may become stale, thus decreasing the freshness of served results. In this work, we first argue that the real problem in today's caching for large-scale search engines is not eviction policies, but the ability to cope with changes to the index, i.e., cache freshness. We then introduce a novel algorithm that uses a time-to-live value to set cache entries to expire and selectively refreshes cached results by issuing refresh queries to back-end search clusters. The algorithm prioritizes the entries to refresh according to a heuristic that combines the frequency of access with the age of an entry in the cache. In addition, for setting the rate at which refresh queries are issued, we present a mechanism that takes into account idle cycles of back-end servers. Evaluation using a real workload shows that our algorithm can achieve hit rate improvements as well as reduction in average hit ages. An implementation of this algorithm is currently in production use at Yahoo!.