Optimizing result prefetching in web search engines with segmented indices

Authors:
Ronny Lempel;Shlomo Moran
Affiliations:
IBM Research Labs, Haifa, Israel;Technion, Haifa, Israel
Venue:
ACM Transactions on Internet Technology (TOIT)
Year:
2004

Citing 19
Cited 9

Numerical recipes in C: the art of scientific computing

Numerical recipes in C: the art of scientific computing
Introduction to algorithms

Introduction to algorithms
Inverted File Partitioning Schemes in Multiple Disk Systems

IEEE Transactions on Parallel and Distributed Systems
Randomized algorithms

Randomized algorithms
Query performance for tightly coupled distributed digital libraries

Proceedings of the third ACM conference on Digital libraries
The art of computer programming, volume 3: (2nd ed.) sorting and searching

The art of computer programming, volume 3: (2nd ed.) sorting and searching
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Balanced Allocations

SIAM Journal on Computing
Evaluating the performance of distributed architectures for information retrieval using a variety of workloads

ACM Transactions on Information Systems (TOIS)
Real life, real users, and real needs: a study and analysis of user queries on the web

Information Processing and Management: an International Journal
Building a distributed full-text index for the Web

Proceedings of the 10th international conference on World Wide Web
Performance of inverted indices in shared-nothing distributed text document informatioon retrieval systems

PDIS '93 Proceedings of the second international conference on Parallel and distributed information systems
Searching the Web

ACM Transactions on Internet Technology (TOIT)
Optimal crawling strategies for web search engines

Proceedings of the 11th international conference on World Wide Web
Scalable Text Retrieval for Large Digital Libraries

ECDL '97 Proceedings of the First European Conference on Research and Advanced Technology for Digital Libraries
Predictive caching and prefetching of query results in search engines

WWW '03 Proceedings of the 12th international conference on World Wide Web
Web Search for a Planet: The Google Cluster Architecture

IEEE Micro
Randomized Allocation Processes

FOCS '97 Proceedings of the 38th Annual Symposium on Foundations of Computer Science
Parallel Generation of Inverted Files for Distributed Text Collections

SCCC '98 Proceedings of the XVIII International Conference of the Chilean Computer Science Society

Integrating Web Caching and Web Prefetching in Client-Side Proxies

IEEE Transactions on Parallel and Distributed Systems
Inverted files for text search engines

ACM Computing Surveys (CSUR)
On anonymizing query logs via token-based hashing

Proceedings of the 16th international conference on World Wide Web
Performance analysis of a client-side caching/prefetching system for Web traffic

Computer Networks: The International Journal of Computer and Telecommunications Networking
Pruning policies for two-tiered inverted index with correctness guarantee

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Mining Query Logs: Turning Search Usage Data into Knowledge

Foundations and Trends in Information Retrieval
Optimizing search engines results using linear programming

Expert Systems with Applications: An International Journal
Online result cache invalidation for real-time web search

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Second Chance: A Hybrid Approach for Dynamic Result Caching and Prefetching in Search Engines

ACM Transactions on the Web (TWEB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study the process in which search engines with segmented indices serve queries. In particular, we investigate the number of result pages that search engines should prepare during the query processing phase.Search engine users have been observed to browse through very few pages of results for queries that they submit. This behavior of users suggests that prefetching many results upon processing an initial query is not efficient, since most of the prefetched results will not be requested by the user who initiated the search. However, a policy that abandons result prefetching in favor of retrieving just the first page of search results might not make optimal use of system resources either.We argue that for a certain behavior of users, engines should prefetch a constant number of result pages per query. We define a concrete query processing model for search engines with segmented indices, and analyze the cost of such prefetching policies. Based on these costs, we show how to determine the constant that optimizes the prefetching policy. Our results are mostly applicable to local index partitions of the inverted files, but are also applicable to processing short queries in global index architectures.