The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Who Links to Whom: Mining Linkage between Web Sites
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Design and Implementation of a High-Performance Distributed Web Crawler
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
The Grid 2: Blueprint for a New Computing Infrastructure
The Grid 2: Blueprint for a New Computing Infrastructure
Globus toolkit version 4: software for service-oriented systems
NPC'05 Proceedings of the 2005 IFIP international conference on Network and Parallel Computing
Hi-index | 0.00 |
Nowadays, there is a trend to create resource-consuming applications without building heavy computer centers, but to use resources on computer systems distributed over the internet. Grid middleware is a framework to access these resources. The concern of this paper is the evaluation of a specific grid middleware, namely Globus Toolkit, for data-intensive applications. As a test case, we have designed and implemented a service-based distributed web crawler on top of this middleware: A web crawler is a complex application consisting of many nodes. It imposes significantly higher demands on grid middleware regarding administrative flexibility compared to grid applications that allocate computing power of grid nodes. We have observed that some components of Globus Toolkit are flexible enough to provide the control functionality necessary for a web crawler, while others are not. For these other components, we propose possible extensions. Since we expect the combination of those characteristics to occur with many other grid applications as well, our study is of broader interest, beyond web crawling.