Three-level caching for efficient query processing in large Web search engines

Authors:
Xiaohui Long;Torsten Suel
Affiliations:
Polytechnic University, Brooklyn, NY;Polytechnic University, Brooklyn, NY
Venue:
WWW '05 Proceedings of the 14th international conference on World Wide Web
Year:
2005

Citing 30
Cited 56

Filtered document retrieval with frequency-sorted indexes

Journal of the American Society for Information Science
Combining fuzzy information from multiple systems (extended abstract)

PODS '96 Proceedings of the fifteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Interaction of query evaluation and buffer management for information retrieval

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Compressed inverted files with reduced decoding overheads

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
On-line file caching

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
Efficient passage ranking for document databases

ACM Transactions on Information Systems (TOIS)
Adaptive set intersections, unions, and differences

SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Building a distributed full-text index for the Web

Proceedings of the 10th international conference on World Wide Web
Optimal aggregation algorithms for middleware

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Searching the Web

ACM Transactions on Internet Technology (TOIT)
Vector-space ranking with effective early termination

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Static index pruning for information retrieval systems

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Rank-preserving two-level caching for scalable search engines

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Topic-sensitive PageRank

Proceedings of the 11th international conference on World Wide Web
Modern Information Retrieval

Modern Information Retrieval
Efficient phrase querying with an auxiliary index

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Compression of inverted indexes For fast query evaluation

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Lessons from Giant-Scale Services

IEEE Internet Computing
Performance of Inverted Indices in Distributed Text Document Retrieval Systems

PDIS '93 Proceedings of the 2nd International Conference on Parallel and Distributed Information Systems
Predictive caching and prefetching of query results in search engines

WWW '03 Proceedings of the 12th international conference on World Wide Web
On the Resemblance and Containment of Documents

SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
Design and Implementation of a High-Performance Distributed Web Crawler

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Multi-Tier Architecture for Web Search Engines

LA-WEB '03 Proceedings of the First Conference on Latin American Web Congress
Outperforming LRU with an Adaptive Replacement Cache Algorithm

Computer
Cost-aware WWW proxy caching algorithms

USITS'97 Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems
Optimizing result prefetching in web search engines with segmented indices

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Optimized query execution in large search engines with global page ordering

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29

Boosting the performance of Web search engines: Caching and prefetching query results by exploiting historical usage data

ACM Transactions on Information Systems (TOIS)
Optimizing scoring functions and indexes for proximity search in type-annotated corpora

Proceedings of the 15th international conference on World Wide Web
Efficient query processing in geographic web search engines

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
The impact of caching on search engines

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Pruning policies for two-tiered inverted index with correctness guarantee

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Performance of compressed inverted list caching in search engines

Proceedings of the 17th international conference on World Wide Web
ResIn: a combination of results caching and index pruning for high-performance web search engines
Design trade-offs for search engine caching

ACM Transactions on the Web (TWEB)
Can phrase indexing help to process non-phrase queries?

Proceedings of the 17th ACM conference on Information and knowledge management
Top-k aggregation using intersections of ranked inputs

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Improved techniques for result caching in web search engines

Proceedings of the 18th international conference on World wide web
Nearest-neighbor caching for content-match applications

Proceedings of the 18th international conference on World wide web
Selective-NRA Algorithms for Top-k Queries

APWeb/WAIM '09 Proceedings of the Joint International Conferences on Advances in Data and Web Management
A Last-Resort Semantic Cache for Web Queries

SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Entry Pairing in Inverted File

WISE '09 Proceedings of the 10th International Conference on Web Information Systems Engineering
Revisiting globally sorted indexes for efficient document retrieval

Proceedings of the third ACM international conference on Web search and data mining
Beyond pages: supporting efficient, scalable entity search with dual-inversion index

Proceedings of the 13th International Conference on Extending Database Technology
A method for processing the natural language query in ontology-based image retrieval system

AMR'06 Proceedings of the 4th international conference on Adaptive multimedia retrieval: user, context, and feedback
A hybrid cache and prefetch mechanism for scientific literature search engines

ICWE'07 Proceedings of the 7th international conference on Web engineering
A refreshing perspective of search engine caching

Proceedings of the 19th international conference on World wide web
Admission policies for caches of search engine results

SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Mining Query Logs: Turning Search Usage Data into Knowledge

Foundations and Trends in Information Retrieval
Flood little, cache more: effective result-reuse in P2P IR systems

DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
Caching search engine results over incremental indices

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Query forwarding in geographically distributed search engines

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
New caching techniques for web search engines

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Active caching for similarity queries based on shared-neighbor information

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Efficient term proximity search with term-pair indexes

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Engineering basic algorithms of an in-memory text search engine

ACM Transactions on Information Systems (TOIS)
Batch query processing for web search engines

Proceedings of the fourth ACM international conference on Web search and data mining
Cost-Aware Strategies for Query Result Caching in Web Search Engines

ACM Transactions on the Web (TWEB)
On-line multi-threaded processing of web user-clicks on multi-core processors

VECPAR'10 Proceedings of the 9th international conference on High performance computing for computational science
Performance evaluation of improved web search algorithms

VECPAR'10 Proceedings of the 9th international conference on High performance computing for computational science
Timestamp-based result cache invalidation for web search engines

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Efficiently encoding term co-occurrences in inverted indexes

Proceedings of the 20th ACM international conference on Information and knowledge management
High-performance processing of text queries with tunable pruned term and term pair indexes

ACM Transactions on Information Systems (TOIS)
Learning to distribute queries into web search nodes

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
SKIF-P: a point-based indexing and ranking of web documents for spatial-keyword search

Geoinformatica
Towards a distributed search engine

CIAC'10 Proceedings of the 7th international conference on Algorithms and Complexity
An ontology-based approach of multimedia information personalized search

AMR'10 Proceedings of the 8th international conference on Adaptive Multimedia Retrieval: context, exploration, and fusion
Scalable search platform: improving pipelined query processing for distributed full-text retrieval

Proceedings of the 21st international conference companion on World Wide Web
Efficient top-k document retrieval using a term-document binary matrix

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Effective caching of shortest paths for location-based services

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Modeling static caching in web search engines

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Distributed search based on self-indexed compressed text

Information Processing and Management: an International Journal
A five-level static cache architecture for web search engines

Information Processing and Management: an International Journal
Prefetching query results and its impact on search engines

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Online result cache invalidation for real-time web search

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Cache-Based Query Processing for Search Engines

ACM Transactions on the Web (TWEB)
Document selection for tiered indexing in commerce search

Proceedings of the sixth ACM international conference on Web search and data mining
Generalized scale independence through incremental precomputation

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
The impact of solid state drive on search engine cache management

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Permutation indexing: fast approximate retrieval from large corpora

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Exploiting query term correlation for list caching in web search engines

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Second Chance: A Hybrid Approach for Dynamic Result Caching and Prefetching in Search Engines

ACM Transactions on the Web (TWEB)
Document vector representations for feature extraction in multi-stage document ranking

Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Large web search engines have to answer thousands of queries per second with interactive response times. Due to the sizes of the data sets involved, often in the range of multiple terabytes, a single query may require the processing of hundreds of megabytes or more of index data. To keep up with this immense workload, large search engines employ clusters of hundreds or thousands of machines, and a number of techniques such as caching, index compression, and index and query pruning are used to improve scalability. In particular, two-level caching techniques cache results of repeated identical queries at the frontend, while index data for frequently used query terms are cached in each node at a lower level.We propose and evaluate a three-level caching scheme that adds an intermediate level of caching for additional performance gains. This intermediate level attempts to exploit frequently occurring pairs of terms by caching intersections or projections of the corresponding inverted lists. We propose and study several offline and online algorithms for the resulting weighted caching problem, which turns out to be surprisingly rich in structure. Our experimental evaluation based on a large web crawl and real search engine query log shows significant performance gains for the best schemes, both in isolation and in combination with the other caching levels. We also observe that a careful selection of cache admission and eviction policies is crucial for best overall performance.