Numerical recipes in C: the art of scientific computing
Numerical recipes in C: the art of scientific computing
Introduction to algorithms
Inverted File Partitioning Schemes in Multiple Disk Systems
IEEE Transactions on Parallel and Distributed Systems
Randomized algorithms
Query performance for tightly coupled distributed digital libraries
Proceedings of the third ACM conference on Digital libraries
The art of computer programming, volume 3: (2nd ed.) sorting and searching
The art of computer programming, volume 3: (2nd ed.) sorting and searching
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
SIAM Journal on Computing
ACM Transactions on Information Systems (TOIS)
Real life, real users, and real needs: a study and analysis of user queries on the web
Information Processing and Management: an International Journal
Building a distributed full-text index for the Web
Proceedings of the 10th international conference on World Wide Web
PDIS '93 Proceedings of the second international conference on Parallel and distributed information systems
ACM Transactions on Internet Technology (TOIT)
Optimal crawling strategies for web search engines
Proceedings of the 11th international conference on World Wide Web
Scalable Text Retrieval for Large Digital Libraries
ECDL '97 Proceedings of the First European Conference on Research and Advanced Technology for Digital Libraries
Predictive caching and prefetching of query results in search engines
WWW '03 Proceedings of the 12th international conference on World Wide Web
Randomized Allocation Processes
FOCS '97 Proceedings of the 38th Annual Symposium on Foundations of Computer Science
Parallel Generation of Inverted Files for Distributed Text Collections
SCCC '98 Proceedings of the XVIII International Conference of the Chilean Computer Science Society
Integrating Web Caching and Web Prefetching in Client-Side Proxies
IEEE Transactions on Parallel and Distributed Systems
Inverted files for text search engines
ACM Computing Surveys (CSUR)
On anonymizing query logs via token-based hashing
Proceedings of the 16th international conference on World Wide Web
Performance analysis of a client-side caching/prefetching system for Web traffic
Computer Networks: The International Journal of Computer and Telecommunications Networking
Pruning policies for two-tiered inverted index with correctness guarantee
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Mining Query Logs: Turning Search Usage Data into Knowledge
Foundations and Trends in Information Retrieval
Optimizing search engines results using linear programming
Expert Systems with Applications: An International Journal
Online result cache invalidation for real-time web search
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Second Chance: A Hybrid Approach for Dynamic Result Caching and Prefetching in Search Engines
ACM Transactions on the Web (TWEB)
Hi-index | 0.00 |
We study the process in which search engines with segmented indices serve queries. In particular, we investigate the number of result pages that search engines should prepare during the query processing phase.Search engine users have been observed to browse through very few pages of results for queries that they submit. This behavior of users suggests that prefetching many results upon processing an initial query is not efficient, since most of the prefetched results will not be requested by the user who initiated the search. However, a policy that abandons result prefetching in favor of retrieving just the first page of search results might not make optimal use of system resources either.We argue that for a certain behavior of users, engines should prefetch a constant number of result pages per query. We define a concrete query processing model for search engines with segmented indices, and analyze the cost of such prefetching policies. Based on these costs, we show how to determine the constant that optimizes the prefetching policy. Our results are mostly applicable to local index partitions of the inverted files, but are also applicable to processing short queries in global index architectures.