Dynamic index pruning for effective caching

Authors:
Yohannes Tsegay;Andrew Turpin;Justin Zobel
Affiliations:
RMIT University, Melbourne, Australia;RMIT University, Melbourne, Australia;RMIT University, Melbourne, Australia
Venue:
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Year:
2007

Citing 12
Cited 11

Filtered document retrieval with frequency-sorted indexes

Journal of the American Society for Information Science
Self-indexing inverted files for fast text retrieval

ACM Transactions on Information Systems (TOIS)
Efficiency/effectiveness trade-offs in query processing (from theory into practice workshop, 1998 SIGIR conf.)

ACM SIGIR Forum
Searching the Web: the public and their queries

Journal of the American Society for Information Science and Technology
Static index pruning for information retrieval systems

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Rank-preserving two-level caching for scalable search engines

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Improving Web search efficiency via a locality based static pruning method

WWW '05 Proceedings of the 14th international conference on World Wide Web
Inverted files for text search engines

ACM Computing Surveys (CSUR)
Pruned query evaluation using pre-computed impacts

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Three-Level Caching for Efficient Query Processing in Large Web Search Engines

World Wide Web
A pipelined architecture for distributed text query evaluation

Information Retrieval
Cost-aware WWW proxy caching algorithms

USITS'97 Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems

ResIn: a combination of results caching and index pruning for high-performance web search engines
Design trade-offs for search engine caching

ACM Transactions on the Web (TWEB)
A Cost-Aware Strategy for Query Result Caching in Web Search Engines

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Revisiting globally sorted indexes for efficient document retrieval

Proceedings of the third ACM international conference on Web search and data mining
Efficient term proximity search with term-pair indexes

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Engineering basic algorithms of an in-memory text search engine

ACM Transactions on Information Systems (TOIS)
Cost-Aware Strategies for Query Result Caching in Web Search Engines

ACM Transactions on the Web (TWEB)
Within-document term-based index pruning with statistical hypothesis testing

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
University of Otago at INEX 2010

INEX'10 Proceedings of the 9th international conference on Initiative for the evaluation of XML retrieval: comparative evaluation of focused retrieval
Online result cache invalidation for real-time web search

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Latency-aware strategy for static list caching in flash-based web search engines

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

RAM and dynamic pruning schemes to reduce query evaluation times. While only a small portion of lists are processed with dynamic pruning, current systems still store the entire inverted list in cache. In this paper we investigate caching only the pieces of the inverted lists that are actually used to answer a query during dynamic pruning. We examine an LRU cache model, and two recently proposed models. We also introduce a new dynamic pruning scheme for impact-ordered inverted lists. Using two large web collections and corresponding query logs we show that, using an LRU cache, our new pruning scheme reduces the number of disk accesses during query processing time by 7%-15% over the state-of-the-art impact-ordered baseline, without reducing answer quality. Surprisingly, however, we discover that using our new pruning scheme makes little difference to disk traffic when the more sophisticated caching schemes are employed.