A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs
SIAM Journal on Scientific Computing
Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
Intelligent crawling on the World Wide Web with arbitrary predicates
Proceedings of the 10th international conference on World Wide Web
Proceedings of the 11th international conference on World Wide Web
Topic-oriented collaborative crawling
Proceedings of the eleventh international conference on Information and knowledge management
Computing Geographical Scopes of Web Resources
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
GTrace - A Graphical Traceroute Tool
LISA '99 Proceedings of the 13th USENIX conference on System administration
Geographically focused collaborative crawling
Proceedings of the 15th international conference on World Wide Web
On the feasibility of geographically distributed web crawling
Proceedings of the 3rd international conference on Scalable information systems
Efficient Partitioning Strategies for Distributed Web Crawling
Information Networking. Towards Ubiquitous Networking and Services
BEIRA: An Area-based User Interface for Map Services
World Wide Web
Topical web crawling using weighted anchor text and web page change detection techniques
WSEAS Transactions on Information Science and Applications
On the feasibility of multi-site web search engines
Proceedings of the 18th ACM conference on Information and knowledge management
BEIRA: a geo-semantic clustering method for area summary
WISE'07 Proceedings of the 8th international conference on Web information systems engineering
Discovering URLs through user feedback
Proceedings of the 20th ACM international conference on Information and knowledge management
Towards a distributed search engine
CIAC'10 Proceedings of the 7th international conference on Algorithms and Complexity
A brief history of web crawlers
CASCON '13 Proceedings of the 2013 Conference of the Center for Advanced Studies on Collaborative Research
Hi-index | 0.00 |
This paper evaluates scalable distributed crawling by means of the geographical partition of the Web. The approach is based on the existence of multiple distributed crawlers each one responsible for the pages belonging to one or more previously identified geographical zones. The work considers a distributed crawler where the assignment of pages to visit is based on page content geographical scope. For the initial assignment of a page to a partition we use a simple heuristic that marks a page within the same scope of the hosting web server geographical location. During download, if the analyze of a page contents recommends a different geographical scope, the page is forwarded to the well-located web server.A sample of the Portuguese Web pages, extracted during the year 2005, was used to evaluate: a) page download communication times and the b) overhead of pages exchange among servers. Evaluation results permit to compare our approach to conventional hash partitioning strategies.