Design of a crawler with bounded bandwidth

Authors:
Michelangelo Diligenti;Marco Maggini;Filippo Maria Pucci;Franco Scarselli
Affiliations:
Università di Siena Via Roma, Siena, Italy;Università di Siena Via Roma, Siena, Italy;Università di Siena Via Roma, Siena, Italy;Università di Siena Via Roma, Siena, Italy
Venue:
Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
Year:
2004

Citing 6
Cited 2

Efficient crawling through URL ordering

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Breadth-first crawling yields high-quality pages

Proceedings of the 10th international conference on World Wide Web
Searching the Web

ACM Transactions on Internet Technology (TOIT)
Using Reinforcement Learning to Spider the Web Efficiently

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Focused Crawling Using Context Graphs

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Design and Implementation of a High-Performance Distributed Web Crawler

ICDE '02 Proceedings of the 18th International Conference on Data Engineering

Adaptive focused crawling

The adaptive web
Designing a fast file system crawler with incremental differencing

ACM SIGOPS Operating Systems Review

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents an algorithm to bound the bandwidth of a Web crawler. The crawler collects statistics on the transfer rate of each server to predict the expected bandwidth use for future downloads. The prediction allows us to activate the optimal number of fetcher threads in order to exploit the assigned bandwidth. The experimental results show the effectiveness of the proposed technique.