Heavy-tailed distributions and multi-keyword queries

Authors:
Surajit Chaudhuri;Kenneth Church;Arnd Christian König;Liying Sui
Affiliations:
Microsoft Corporation, Redmond, WA;Microsoft Corporation, Redmond, WA;Microsoft Corporation, Redmond, WA;Microsoft Corporation, Redmond, WA
Venue:
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2007

Citing 7
Cited 14

Filtered document retrieval with frequency-sorted indexes

Journal of the American Society for Information Science
Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
Vector-space ranking with effective early termination

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Database System Implementation

Database System Implementation
Efficient phrase querying with an auxiliary index

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Cell-probe lower bounds for the partial match problem

Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
Fast phrase querying with combined indexes

ACM Transactions on Information Systems (TOIS)

Scheduling Intersection Queries in Term Partitioned Inverted Files

Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Entry Pairing in Inverted File

WISE '09 Proceedings of the 10th International Conference on Web Information Systems Engineering
Precomputing search features for fast and accurate query classification

Proceedings of the third ACM international conference on Web search and data mining
Sync/Async parallel search for the efficient design and construction of web search engines

Parallel Computing
On indexing error-tolerant set containment

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Index structures for efficiently searching natural language text

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Batch query processing for web search engines

Proceedings of the fourth ACM international conference on Web search and data mining
Efficient answering of set containment queries for skewed item distributions

Proceedings of the 14th International Conference on Extending Database Technology
Context-sensitive ranking for document retrieval

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Rules of thumb for information acquisition from large and redundant data

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
An evaluation of fault-tolerant query processing for web search engines

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
High-performance processing of text queries with tunable pruned term and term pair indexes

ACM Transactions on Information Systems (TOIS)
Exploiting query term correlation for list caching in web search engines

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Efficient multi-keyword ranked query over encrypted data in cloud computing

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Intersecting inverted indexes is a fundamental operation for many applications in information retrieval and databases. Efficient indexing for this operation is known to be a hard problem for arbitrary data distributions. However, text corpora used in Information Retrieval applications often have convenient power-law constraints (also known as Zipf's Law and long tails) that allow us to materialize carefully chosen combinations of multi-keyword indexes, which significantly improve worst-case performance without requiring excessive storage. These multi-keyword indexes limit the number of postings accessed when computing arbitrary index intersections. Our evaluation on an e-commerce collection of 20 million products shows that the indexes of up to four arbitrary keywords can be intersected while accessing less than 20% of the postings in the largest single-keyword index.