Performance of compressed inverted list caching in search engines

Authors:
Jiangong Zhang;Xiaohui Long;Torsten Suel
Affiliations:
Polytechnic University, Brooklyn, NY, USA;Microsoft Corporation, Redmond, WA, USA;Polytechnic University, Brooklyn, NY, USA
Venue:
Proceedings of the 17th international conference on World Wide Web
Year:
2008

Citing 23
Cited 55

Self-indexing inverted files for fast text retrieval

ACM Transactions on Information Systems (TOIS)
Interaction of query evaluation and buffer management for information retrieval

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
On-line file caching

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Rank-preserving two-level caching for scalable search engines

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Modern Information Retrieval

Modern Information Retrieval
Compression of inverted indexes For fast query evaluation

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Adding Compression to Block Addressing Inverted Indexes

Information Retrieval
The Multi-Queue Replacement Algorithm for Second Level Buffer Caches

Proceedings of the General Track: 2002 USENIX Annual Technical Conference
Predictive caching and prefetching of query results in search engines

WWW '03 Proceedings of the 12th international conference on World Wide Web
Design and Implementation of a High-Performance Distributed Web Crawler

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Outperforming LRU with an Adaptive Replacement Cache Algorithm

Computer
Index compression using fixed binary codewords

ADC '04 Proceedings of the 15th Australasian database conference - Volume 27
Three-level caching for efficient query processing in large Web search engines

WWW '05 Proceedings of the 14th international conference on World Wide Web
ARC: A Self-Tuning, Low Overhead Replacement Cache

FAST '03 Proceedings of the 2nd USENIX Conference on File and Storage Technologies
Boosting the performance of Web search engines: Caching and prefetching query results by exploiting historical usage data

ACM Transactions on Information Systems (TOIS)
Super-Scalar RAM-CPU Cache Compression

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Improved Word-Aligned Binary Compression for Text Indexing

IEEE Transactions on Knowledge and Data Engineering
Inverted files for text search engines

ACM Computing Surveys (CSUR)
Cost-aware WWW proxy caching algorithms

USITS'97 Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems
The impact of caching on search engines

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Index compression is good, especially for random access

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Admission policies for caches of search engine results

SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Compressed perfect embedded skip lists for quick inverted-index lookups

SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval

Using graphics processors for high-performance IR query processing

Proceedings of the 17th international conference on World Wide Web
ResIn: a combination of results caching and index pruning for high-performance web search engines
Design trade-offs for search engine caching

ACM Transactions on the Web (TWEB)
Scheduling Intersection Queries in Term Partitioned Inverted Files

Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Inverted index compression and query processing with optimized document ordering

Proceedings of the 18th international conference on World wide web
Using graphics processors for high performance IR query processing

Proceedings of the 18th international conference on World wide web
Improved techniques for result caching in web search engines

Proceedings of the 18th international conference on World wide web
Nearest-neighbor caching for content-match applications

Proceedings of the 18th international conference on World wide web
Efficient Data Structure for XML Keyword Search

DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Investigation of the accuracy of search engine hit counts

Journal of Information Science
Compressing term positions in web indexes

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Compact full-text indexing of versioned document collections

Proceedings of the 18th ACM conference on Information and knowledge management
Inverted indexes vs. bitmap indexes in decision support systems

Proceedings of the 18th ACM conference on Information and knowledge management
Entry Pairing in Inverted File

WISE '09 Proceedings of the 10th International Conference on Web Information Systems Engineering
Index compression using 64-bit words

Software—Practice & Experience
On compressing the textual web

Proceedings of the third ACM international conference on Web search and data mining
Beyond pages: supporting efficient, scalable entity search with dual-inversion index

Proceedings of the 13th International Conference on Extending Database Technology
Scalable techniques for document identifier assignment in inverted indexes

Proceedings of the 19th international conference on World wide web
An efficient random access inverted index for information retrieval

Proceedings of the 19th international conference on World wide web
Search in social networks with access control

Proceedings of the 2nd International Workshop on Keyword Search on Structured Data
Active caching for similarity queries based on shared-neighbor information

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Improved index compression techniques for versioned document collections

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Engineering basic algorithms of an in-memory text search engine

ACM Transactions on Information Systems (TOIS)
Advancing search query autocompletion services with more and better suggestions

ICWE'10 Proceedings of the 10th international conference on Web engineering
Batch query processing for web search engines

Proceedings of the fourth ACM international conference on Web search and data mining
Inverted index compression via online document routing

Proceedings of the 20th international conference on World wide web
Reordering columns for smaller indexes

Information Sciences: an International Journal
Efficient compressed inverted index skipping for disjunctive text-queries

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Faster temporal range queries over versioned text

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Posting list intersection on multicore architectures

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Faster top-k document retrieval using block-max indexes

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Using graph aggregation for service interaction message correlation

CAiSE'11 Proceedings of the 23rd international conference on Advanced information systems engineering
A query language for analyzing business processes execution

BPM'11 Proceedings of the 9th international conference on Business process management
Text vs. space: efficient geo-search query processing

Proceedings of the 20th ACM international conference on Information and knowledge management
Workload-aware indexing for keyword search in social networks

Proceedings of the 20th ACM international conference on Information and knowledge management
Optimized top-k processing with global page scores on block-max indexes

Proceedings of the fifth ACM international conference on Web search and data mining
Searching web data: An entity retrieval and high-performance indexing model

Web Semantics: Science, Services and Agents on the World Wide Web
Index ordering by query-independent measures

Information Processing and Management: an International Journal
Scalable search platform: improving pipelined query processing for distributed full-text retrieval

Proceedings of the 21st international conference companion on World Wide Web
Efficient top-k document retrieval using a term-document binary matrix

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
A five-level static cache architecture for web search engines

Information Processing and Management: an International Journal
Lossless asymmetric single instruction multiple data codec

Software—Practice & Experience
Optimizing top-k document retrieval strategies for block-max indexes

Proceedings of the sixth ACM international conference on Web search and data mining
Words context analysis for improvement of information retrieval

ICCCI'12 Proceedings of the 4th international conference on Computational Collective Intelligence: technologies and applications - Volume Part I
Development of a Novel Compressed Index-Query Web Search Engine Model

International Journal of Information Technology and Web Engineering
Scalable in situ scientific data encoding for analytical query processing

Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
The impact of solid state drive on search engine cache management

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
A candidate filtering mechanism for fast top-k query processing on modern cpus

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Dynamic memory allocation policies for postings in real-time Twitter search

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Context-aware top-K processing using views

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Exploiting query term correlation for list caching in web search engines

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Bitlist: new full-text index for low space cost and efficient keyword search

Proceedings of the VLDB Endowment
Second Chance: A Hybrid Approach for Dynamic Result Caching and Prefetching in Search Engines

ACM Transactions on the Web (TWEB)
Re-Ordered FEGC and Block Based FEGC for Inverted File Compression

International Journal of Information Retrieval Research
Document vector representations for feature extraction in multi-stage document ranking

Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Due to the rapid growth in the size of the web, web search engines are facing enormous performance challenges. The larger engines in particular have to be able to process tens of thousands of queries per second on tens of billions of documents, making query throughput a critical issue. To satisfy this heavy workload, search engines use a variety of performance optimizations including index compression, caching, and early termination. We focus on two techniques, inverted index compression and index caching, which play a crucial rule in web search engines as well as other high-performance information retrieval systems. We perform a comparison and evaluation of several inverted list compression algorithms, including new variants of existing algorithms that have not been studied before. We then evaluate different inverted list caching policies on large query traces, and finally study the possible performance benefits of combining compression and caching. The overall goal of this paper is to provide an updated discussion and evaluation of these two techniques, and to show how to select the best set of approaches and settings depending on parameter such as disk speed and main memory cache size.