Collaborative Web Crawling: Information Gathering/Processing over Internet

  • Authors:
  • Shang-Hua Teng;Qi Lu;Matthias Eichstaedt;Daniel Ford;Tobin Lehman

  • Affiliations:
  • -;-;-;-;-

  • Venue:
  • HICSS '99 Proceedings of the Thirty-second Annual Hawaii International Conference on System Sciences-Volume 5 - Volume 5
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

IBM Almaden Research Center, 650 Harry Road, San Jose, California 95120-6099. In this paper, we present a scalable method for collaborative web crawling and information processing. The method includes an automatic cyberspace partitioner which is designed to dynamically balance and re-balance the load among processors. It can be can be used when all web crawlers are located on a tightly coupled high-performance system as well as when they are scattered in a distributed environment. We have implemented our algorithms in Java as a part of the IBM Grand Central Station (GCS) system.