Distributed queries and incremental updates in information retrieval systems
Distributed queries and incremental updates in information retrieval systems
Optimal communication algorithms on star graphs using spanning tree constructions
Journal of Parallel and Distributed Computing
LogP: Towards a Realistic Model of Parallel Computation
LogP: Towards a Realistic Model of Parallel Computation
A UML Variant for Modeling System Searchability
OOIS '02 Proceedings of the 8th International Conference on Object-Oriented. Information Systems
Hi-index | 0.00 |
In this paper a formal model for the domain of Internet search is presented that makes it possible to quantify the relations between important parameters of a distributed search architecture. Among these are physical network parameters, query frequency, required currency of search results, change rate of the data to be searched, logical network topology, and total bandwidth consumption for answering one query. The model is then used to compute many important relations between the various parameters. The results can be used to quantitatively assess, streamline, and optimize distributed Internet search architectures. The results back the general perception that a centralized approach to Internet-scale search will no longer be able to provide the desired coverage and currency, especially given that the Internet's content keeps growing much faster than the bandwidth available to index it. Using a hierarchical distribution approach and using change-based update notications instead of polling for changes allows to address sets of objects that are several orders of magnitude larger than what is possible with a centralized approach. Yet, using such an approach does not signicantly increase the total bandwidth required for a single query per object reached by the search.