Quantifying performance and quality gains in distributed web search engines

Authors:
B Barla Cambazoglu;Vassilis Plachouras;Ricardo Baeza-Yates
Affiliations:
Yahoo! Research, Barcelona, Spain;Yahoo! Research, Barcelona, Spain;Yahoo! Research, Barcelona, Spain
Venue:
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Year:
2009

Citing 9
Cited 13

Searching distributed collections with inference networks

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Efficient crawling through URL ordering

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
Breadth-first crawling yields high-quality pages

Proceedings of the 10th international conference on World Wide Web
Parallel crawlers

Proceedings of the 11th international conference on World Wide Web
SETS: search enhanced by topic segmentation

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Crawling a country: better strategies than breadth-first for web page ordering

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Query-driven document partitioning and collection selection

InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
On the feasibility of geographically distributed web crawling

Proceedings of the 3rd international conference on Scalable information systems

On the feasibility of multi-site web search engines

Proceedings of the 18th ACM conference on Information and knowledge management
Early exit optimizations for additive machine learned ranking systems

Proceedings of the third ACM international conference on Web search and data mining
Query forwarding in geographically distributed search engines

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Document assignment in multi-site search engines

Proceedings of the fourth ACM international conference on Web search and data mining
Stochastic query covering

Proceedings of the fourth ACM international conference on Web search and data mining
Energy-price-driven query processing in multi-center web search engines

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Assigning documents to master sites in distributed search

Proceedings of the 20th ACM international conference on Information and knowledge management
Workload-aware indexing for keyword search in social networks

Proceedings of the 20th ACM international conference on Information and knowledge management
Towards a distributed search engine

CIAC'10 Proceedings of the 7th international conference on Algorithms and Complexity
Reactive index replication for distributed search engines

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Document replication strategies for geographically distributed web search engines

Information Processing and Management: an International Journal
Distributed information retrieval and applications

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Rank-energy selective query forwarding for distributed search systems

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Distributed search engines based on geographical partitioning of a central Web index emerge as a feasible solution to the immense growth of the Web, user bases, and query traffic. However, there is still lack of research in quantifying the performance and quality gains that can be achieved by such architectures. In this paper, we develop various cost models to evaluate the performance benefits of a geographically distributed search engine architecture based on partial index replication and query forwarding. Specifically, we focus on possible performance gains due to the distributed nature of query processing and Web crawling processes. We show that any response time gain achieved by distributed query processing can be utilized to improve search relevance as the use of complex but more accurate algorithms can now be enabled for document ranking. We also show that distributed Web crawling leads to better Web coverage and try to see if this improves the search quality. We verify the validity of our claims over large, real-life datasets via simulations.