Multi-Tier Architecture for Web Search Engines
LA-WEB '03 Proceedings of the First Conference on Latin American Web Congress
What's new on the web?: the evolution of the web from a search engine perspective
Proceedings of the 13th international conference on World Wide Web
A large-scale study of the evolution of web pages
Software—Practice & Experience - Special issue: Web technologies
Collection selection for managed distributed document databases
Information Processing and Management: an International Journal
Rate of change and other metrics: a live study of the world wide web
USITS'97 Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems
Pruning policies for two-tiered inverted index with correctness guarantee
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Load-balancing and caching for collection selection architectures
Proceedings of the 2nd international conference on Scalable information systems
Document assignment in multi-site search engines
Proceedings of the fourth ACM international conference on Web search and data mining
Hi-index | 0.00 |
Query processing in Web search engines today is mainly performed within a single site or data center, which is required to scale as the Web grows and users require fast answers to their queries. Constraints in the size and cost of data centers, however, may limit the scalability of search engines. Multi-site search engines that perform distributed query processing represent one way to overcome such constraints. Each site processes locally as many queries as possible, keeping latency low without contacting remote sites. Forwarding a query to remote sites depends on the document collection of remote sites. Multi-site search engines pose several new challenges. When a site updates its index, it has to inform other sites. The updates, however, are not instantaneous due to the volume of data exchanged or possible network failures. During the period of time that there are index inconsistencies across sites, queries may not be forwarded optimally. In this work, we investigate the impact of index inconsistencies on a distributed query processing algorithm, when there are index updates, and we observe that delayed index information propagation reduces the effectiveness of query processing, because queries are less likely to be routed optimally.