Optimizing result prefetching in web search engines with segmented indices

Authors:
Ronny Lempel;Shlomo Moran
Affiliations:
Department of Computer Science, The Technion, Haifa, Israel;Department of Computer Science, The Technion, Haifa, Israel
Venue:
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Year:
2002

Citing 14
Cited 14

Inverted File Partitioning Schemes in Multiple Disk Systems

IEEE Transactions on Parallel and Distributed Systems
Interaction of query evaluation and buffer management for information retrieval

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Query performance for tightly coupled distributed digital libraries

Proceedings of the third ACM conference on Digital libraries
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Balanced Allocations

SIAM Journal on Computing
Evaluating the performance of distributed architectures for information retrieval using a variety of workloads

ACM Transactions on Information Systems (TOIS)
Real life, real users, and real needs: a study and analysis of user queries on the web

Information Processing and Management: an International Journal
Building a distributed full-text index for the Web

Proceedings of the 10th international conference on World Wide Web
Performance of inverted indices in shared-nothing distributed text document informatioon retrieval systems

PDIS '93 Proceedings of the second international conference on Parallel and distributed information systems
Searching the Web

ACM Transactions on Internet Technology (TOIT)
The Art of Computer Programming Volumes 1-3 Boxed Set

The Art of Computer Programming Volumes 1-3 Boxed Set
Scalable Text Retrieval for Large Digital Libraries

ECDL '97 Proceedings of the First European Conference on Research and Advanced Technology for Digital Libraries
Randomized Allocation Processes

FOCS '97 Proceedings of the 38th Annual Symposium on Foundations of Computer Science
Parallel Generation of Inverted Files for Distributed Text Collections

SCCC '98 Proceedings of the XVIII International Conference of the Chilean Computer Science Society

Predictive caching and prefetching of query results in search engines

WWW '03 Proceedings of the 12th international conference on World Wide Web
Competitive caching of query results in search engines

Theoretical Computer Science - Special issue: Online algorithms in memoriam, Steve Seiden
Making Search Efficient on Gnutella-Like P2P Systems

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Three-level caching for efficient query processing in large Web search engines

WWW '05 Proceedings of the 14th international conference on World Wide Web
Server selection methods in hybrid portal search

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Efficient query processing in geographic web search engines

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Enhancing Search Performance on Gnutella-Like P2P Systems

IEEE Transactions on Parallel and Distributed Systems
Efficient semantic search on DHT overlays

Journal of Parallel and Distributed Computing
Optimized query execution in large search engines with global page ordering

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Nearest-neighbor caching for content-match applications

Proceedings of the 18th international conference on World wide web
Coordinated data prefetching for web contents

Computer Communications
A hybrid cache and prefetch mechanism for scientific literature search engines

ICWE'07 Proceedings of the 7th international conference on Web engineering
Batch query processing for web search engines

Proceedings of the fourth ACM international conference on Web search and data mining
Faster top-k document retrieval using block-max indexes

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study the process in which search engines with segmented indices serve queries. In particular, we investigate the number of result pages which search engines should prepare during the query processing phase. Search engine users have been observed to browse through very few pages of results for queries which they submit. This behavior of users suggests that prefetching many results upon processing an initial query is not efficient, since most of the prefetched results will not be requested by the user who initiated the search. However, a policy which abandons result prefetching in favor of retrieving just the first page of search results might not make optimal use of system resources as well. We argue that for a certain behavior of users, engines should prefetch a constant number of result pages per query. We define a concrete query processing model for search engines with segmented indices, and analyze the cost of such prefetching policies. Based on these costs, we show how to determine the constant which optimizes the prefetching policy. Our results are mostly applicable to local index partitions of the inverted files, but are also applicable to processing of short queries in global index architectures.