Searching distributed collections with inference networks
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Rank-preserving two-level caching for scalable search engines
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Building a distributed full-text index for the web
ACM Transactions on Information Systems (TOIS)
Building efficient and effective metasearch engines
ACM Computing Surveys (CSUR)
Proceedings of the 11th international conference on World Wide Web
Mercator: A scalable, extensible Web crawler
World Wide Web
Lessons from Giant-Scale Services
IEEE Internet Computing
Query processing and inverted indices in shared: nothing text document information retrieval systems
The VLDB Journal — The International Journal on Very Large Data Bases - Parallelism in database systems
Relevant document distribution estimation method for resource selection
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Multi-Tier Architecture for Web Search Engines
LA-WEB '03 Proceedings of the First Conference on Latin American Web Congress
ACSC '04 Proceedings of the 27th Australasian conference on Computer science - Volume 26
Performance and cost tradeoffs in Web search
ADC '04 Proceedings of the 15th Australasian database conference - Volume 27
Accurately interpreting clickthrough data as implicit feedback
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Geographical partition for distributed web crawling
Proceedings of the 2005 workshop on Geographic information retrieval
Efficient Query Evaluation on Large Textual Collections in a Peer-to-Peer Environment
P2P '05 Proceedings of the Fifth IEEE International Conference on Peer-to-Peer Computing
ACM Transactions on Information Systems (TOIS)
Load balancing for term-distributed parallel retrieval
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Analyzing imbalance among homogeneous index servers in a web search system
Information Processing and Management: an International Journal
Power provisioning for a warehouse-sized computer
Proceedings of the 34th annual international symposium on Computer architecture
A pipelined architecture for distributed text query evaluation
Information Retrieval
The impact of caching on search engines
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Pruning policies for two-tiered inverted index with correctness guarantee
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
IRLbot: scaling to 6 billion pages and beyond
Proceedings of the 17th international conference on World Wide Web
Quantifying performance and quality gains in distributed web search engines
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines
The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines
Query forwarding in geographically distributed search engines
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
KMV-peer: a robust and adaptive peer-selection algorithm
Proceedings of the fourth ACM international conference on Web search and data mining
Document assignment in multi-site search engines
Proceedings of the fourth ACM international conference on Web search and data mining
Indexing strategies for graceful degradation of search quality
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Energy-price-driven query processing in multi-center web search engines
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Optimal network locality in distributed virtualized data-centers
Computer Communications
Assigning documents to master sites in distributed search
Proceedings of the 20th ACM international conference on Information and knowledge management
Chapter 2: next generation web search
Search Computing
Towards a distributed search engine
CIAC'10 Proceedings of the 7th international conference on Algorithms and Complexity
Reactive index replication for distributed search engines
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Document replication strategies for geographically distributed web search engines
Information Processing and Management: an International Journal
Rank-energy selective query forwarding for distributed search systems
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Improving the efficiency of multi-site web search engines
Proceedings of the 7th ACM international conference on Web search and data mining
Hi-index | 0.00 |
Web search engines are often implemented as centralized systems. Designing and implementing a Web search engine in a distributed environment is a challenging engineering task that encompasses many interesting research questions. However, distributing a search engine across multiple sites has several advantages, such as utilizing less compute resources and exploiting data locality. In this paper we investigate the cost-effectiveness of building a distributed Web search engine. We propose a model for assessing the total cost of a distributed Web search engine that includes the computational costs and the communication cost among all distributed sites. We then present a query-processing algorithm that maximizes the amount of queries answered locally, without sacrificing the quality of the results compared to a centralized search engine. We simulate the algorithm on real document collections and query workloads to measure the actual parameters needed for our cost model, and we show that a distributed search engine can be competitive compared to a centralized architecture with respect to real cost.