Static index pruning in web search engines: Combining term and document popularities with query views

Authors:
Ismail S. Altingovde;Rifat Ozcan;Özgür Ulusoy
Affiliations:
Bilkent University, Turkey;Bilkent University, Turkey;Bilkent University, Turkey
Venue:
ACM Transactions on Information Systems (TOIS)
Year:
2012

Citing 22
Cited 4

Filtered document retrieval with frequency-sorted indexes

Journal of the American Society for Information Science
Self-indexing inverted files for fast text retrieval

ACM Transactions on Information Systems (TOIS)
Vector-space ranking with effective early termination

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Static index pruning for information retrieval systems

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Access-ordered indexes

ACSC '04 Proceedings of the 27th Australasian conference on Computer science - Volume 26
Improving Web search efficiency via a locality based static pruning method

WWW '05 Proceedings of the 14th international conference on World Wide Web
Inverted files for text search engines

ACM Computing Surveys (CSUR)
A picture of search

InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
Query-driven document partitioning and collection selection

InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
A document-centric approach to static index pruning in text retrieval systems

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
The impact of caching on search engines

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Pruning policies for two-tiered inverted index with correctness guarantee

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Boosting static pruning of inverted files

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Locality-Based pruning methods for web search

ACM Transactions on Information Systems (TOIS)
Incremental cluster-based retrieval using compressed cluster-skipping inverted files

ACM Transactions on Information Systems (TOIS)
Query-sets: using implicit feedback and query patterns to organize web documents

Proceedings of the 17th international conference on World Wide Web
ResIn: a combination of results caching and index pruning for high-performance web search engines
A Practitioner's Guide for Static Index Pruning

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Exploiting query views for static index pruning in web search engines

Proceedings of the 18th ACM conference on Information and knowledge management
Tuning the capacity of search engines: Load-driven routing and incremental caching to reduce and balance the load

ACM Transactions on Information Systems (TOIS)
Static pruning of terms in inverted files

ECIR'07 Proceedings of the 29th European conference on IR research
Efficient query evaluation through access-reordering

AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology

Cache-Based Query Processing for Search Engines

ACM Transactions on the Web (TWEB)
An information-theoretic account of static index pruning

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
A Fast Static Index Pruning Algorithm

Proceedings of the Second International Conference on Innovative Computing and Cloud Computing
Document vector representations for feature extraction in multi-stage document ranking

Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Static index pruning techniques permanently remove a presumably redundant part of an inverted file, to reduce the file size and query processing time. These techniques differ in deciding which parts of an index can be removed safely; that is, without changing the top-ranked query results. As defined in the literature, the query view of a document is the set of query terms that access to this particular document, that is, retrieves this document among its top results. In this paper, we first propose using query views to improve the quality of the top results compared against the original results. We incorporate query views in a number of static pruning strategies, namely term-centric, document-centric, term popularity based and document access popularity based approaches, and show that the new strategies considerably outperform their counterparts especially for the higher levels of pruning and for both disjunctive and conjunctive query processing. Additionally, we combine the notions of term and document access popularity to form new pruning strategies, and further extend these strategies with the query views. The new strategies improve the result quality especially for the conjunctive query processing, which is the default and most common search mode of a search engine.