Performance and cost tradeoffs in Web search

Authors:
Nick Craswell;Francis Crimmins;David Hawking;Alistair Moffat
Affiliations:
CSIRO -- ICT Centre, Canberra, ACT, Australia;CSIRO -- ICT Centre, Canberra, ACT, Australia;CSIRO -- ICT Centre, Canberra, ACT, Australia;The University of Melbourne, Victoria, Australia
Venue:
ADC '04 Proceedings of the 15th Australasian database conference - Volume 27
Year:
2004

Citing 10
Cited 8

Methods for information server selection

ACM Transactions on Information Systems (TOIS)
Analysis of a very large web search engine query log

ACM SIGIR Forum
An adaptive model for optimizing performance of an incremental web crawler

Proceedings of the 10th international conference on World Wide Web
The Evolution of the Web and Implications for an Incremental Crawler

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
A large-scale study of the evolution of web pages

WWW '03 Proceedings of the 12th international conference on World Wide Web
Automated discovery of search interfaces on the web

ADC '03 Proceedings of the 14th Australasian database conference - Volume 17
SE-LEGO: creating metasearch engines on demand

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Result merging strategies for a current news metasearcher

Information Processing and Management: an International Journal
Collection selection for managed distributed document databases

Information Processing and Management: an International Journal
Effective change detection using sampling

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases

Crawling a country: better strategies than breadth-first for web page ordering

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Server selection methods in hybrid portal search

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Updating collection representations for federated search

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Efficiency trade-offs in two-tier web search systems

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
On the feasibility of multi-site web search engines

Proceedings of the 18th ACM conference on Information and knowledge management
A multi-collection latent topic model for federated search

Information Retrieval
Chapter 14: building search computing applications

Search Computing
Using multiagent self-organization techniques for seeking information in virtual social communities

Web Intelligence and Agent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Web search engines crawl the web to fetch the data that they index. In this paper we re-examine that need, and evaluate the network costs associated with data acquisition, and alternative ways in which a search service might be supported. As a concrete example, we make use of the Research Finder search service provided at http://rf.panopticsearch.com, and information derived from its crawl and query logs. Based upon an analysis of the Research Finder system we introduce a hybrid arrangement, in which queries are evaluated partially by reference to a centrally maintained index representing a subset of the collection, and partially by referring them on to the local search services maintained by the balance of the collection. We also examine various ways in which crawling costs can be reduced.