A Scalable Lightweight Distributed Crawler for Crawling with Limited Resources

Authors:
Milly Kc;Markus Hagenbuchner;Ah Chung Tsoi
Affiliations:
-;-;-
Venue:
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Year:
2008

Citing 4
Cited 0

An adaptive model for optimizing performance of an incremental web crawler

Proceedings of the 10th international conference on World Wide Web
Parallel crawlers

Proceedings of the 11th international conference on World Wide Web
Mercator: A scalable, extensible Web crawler

World Wide Web
IRLbot: scaling to 6 billion pages and beyond

Proceedings of the 17th international conference on World Wide Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Web page crawlers are an essential component in a number of web applications. The sheer size of the Internet can pose problems in the design of web crawlers. All currently known crawlers implement approximations or have limitations so as to maximize the throughput of the crawl, and hence, maximize the number of pages that can be retrieved within a given time frame. This paper proposes a distributed crawling concept which is designed to avoid approximations, to limit the network overhead, and to run on relatively inexpensive hardware. A set of experiments, and comparisons highlight the effectiveness of the proposed approach.