Design trade-offs for search engine caching

Authors:
Ricardo Baeza-Yates;Aristides Gionis;Flavio P. Junqueira;Vanessa Murdock;Vassilis Plachouras;Fabrizio Silvestri
Affiliations:
Yahoo! Research, Barcelona, Spain;Yahoo! Research, Barcelona, Spain;Yahoo! Research, Barcelona, Spain;Yahoo! Research, Barcelona, Spain;Yahoo! Research, Barcelona, Spain;ISTI -- CNR, Pisa, Italy
Venue:
ACM Transactions on the Web (TWEB)
Year:
2008

Citing 26
Cited 28

On the reuse of past optimal queries

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Optimization of inverted vector searches

SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
Real life information retrieval: a study of user queries on the Web

ACM SIGIR Forum
Analysis of a very large web search engine query log

ACM SIGIR Forum
A note on the calculation of average working set size

Communications of the ACM
Rank-preserving two-level caching for scalable search engines

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Managing Gigabytes: Compressing and Indexing Documents and Images

Managing Gigabytes: Compressing and Indexing Documents and Images
Predictive caching and prefetching of query results in search engines

WWW '03 Proceedings of the 12th international conference on World Wide Web
A survey of Web cache replacement strategies

ACM Computing Surveys (CSUR)
Hourly analysis of a very large topically categorized web query log

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
UbiCrawler: a scalable fully distributed web crawler

Software—Practice & Experience
Three-level caching for efficient query processing in large Web search engines

WWW '05 Proceedings of the 14th international conference on World Wide Web
Optimization strategies for complex queries

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Boosting the performance of Web search engines: Caching and prefetching query results by exploiting historical usage data

ACM Transactions on Information Systems (TOIS)
How are we searching the world wide web?: a comparison of nine search engine transaction logs

Information Processing and Management: an International Journal - Special issue: Formal methods for information retrieval
Pruned query evaluation using pre-computed impacts

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
A document-centric approach to static index pruning in text retrieval systems

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
A reference collection for web spam

ACM SIGIR Forum
Cost-aware WWW proxy caching algorithms

USITS'97 Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems
The impact of caching on search engines

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Pruning policies for two-tiered inverted index with correctness guarantee

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Working Sets Past and Present

IEEE Transactions on Software Engineering
Dynamic index pruning for effective caching

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Performance of compressed inverted list caching in search engines

Proceedings of the 17th international conference on World Wide Web
Admission policies for caches of search engine results

SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
On caching search engine query results

Computer Communications

A Last-Resort Semantic Cache for Web Queries

SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
SHARC: framework for quality-conscious web archiving

Proceedings of the VLDB Endowment
Tuning the capacity of search engines: Load-driven routing and incremental caching to reduce and balance the load

ACM Transactions on Information Systems (TOIS)
Mining Query Logs: Turning Search Usage Data into Knowledge

Foundations and Trends in Information Retrieval
Caching search engine results over incremental indices

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
New caching techniques for web search engines

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Engineering basic algorithms of an in-memory text search engine

ACM Transactions on Information Systems (TOIS)
Stochastic query covering

Proceedings of the fourth ACM international conference on Web search and data mining
Caching query-biased snippets for efficient retrieval

Proceedings of the 14th International Conference on Extending Database Technology
Cost-Aware Strategies for Query Result Caching in Web Search Engines

ACM Transactions on the Web (TWEB)
On-line multi-threaded processing of web user-clicks on multi-core processors

VECPAR'10 Proceedings of the 9th international conference on High performance computing for computational science
The SHARC framework for data quality in Web archiving

The VLDB Journal — The International Journal on Very Large Data Bases
Efficiently collecting relevance information from clickthroughs for web retrieval system evaluation

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Query efficiency prediction for dynamic pruning

Proceedings of the 9th workshop on Large-scale and distributed informational retrieval
Learning to distribute queries into web search nodes

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Towards a distributed search engine

CIAC'10 Proceedings of the 7th international conference on Algorithms and Complexity
Modeling static caching in web search engines

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Efficient query recommendations in the long tail via center-piece subgraphs

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Learning to predict response times for online query scheduling

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Online result cache invalidation for real-time web search

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Cache-Based Query Processing for Search Engines

ACM Transactions on the Web (TWEB)
Impact of regionalization on performance of web search engine result caches

SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
The impact of solid state drive on search engine cache management

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Discovering tasks from search engine query logs

ACM Transactions on Information Systems (TOIS)
Permutation indexing: fast approximate retrieval from large corpora

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
CUVIM: extracting fresh information from social network

WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Second Chance: A Hybrid Approach for Dynamic Result Caching and Prefetching in Search Engines

ACM Transactions on the Web (TWEB)
Modelling Search Engines Performance Using Coloured Petri Nets

Fundamenta Informaticae - Application and Theory of Petri Nets and Concurrency, 2012

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this article we study the trade-offs in designing efficient caching systems for Web search engines. We explore the impact of different approaches, such as static vs. dynamic caching, and caching query results vs. caching posting lists. Using a query log spanning a whole year, we explore the limitations of caching and we demonstrate that caching posting lists can achieve higher hit rates than caching query answers. We propose a new algorithm for static caching of posting lists, which outperforms previous methods. We also study the problem of finding the optimal way to split the static cache between answers and posting lists. Finally, we measure how the changes in the query log influence the effectiveness of static caching, given our observation that the distribution of the queries changes slowly over time. Our results and observations are applicable to different levels of the data-access hierarchy, for instance, for a memory/disk layer or a broker/remote server layer.