Architectural design and evaluation of an efficient web-crawling system

  • Authors:
  • Hongfei Yan;Jianyong Wang;Xiaoming Li;Lin Guo

  • Affiliations:
  • Computer Networks and Distributed Systems Laboratory, Department of Computer Science and Technology, Peking University, Beijing 100871, PR China;Computer Networks and Distributed Systems Laboratory, Department of Computer Science and Technology, Peking University, Beijing 100871, PR China;Computer Networks and Distributed Systems Laboratory, Department of Computer Science and Technology, Peking University, Beijing 100871, PR China;Computer Networks and Distributed Systems Laboratory, Department of Computer Science and Technology, Peking University, Beijing 100871, PR China

  • Venue:
  • Journal of Systems and Software
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents an architectural design and evaluation result of an efficient Web-crawling system. The design involves a fully distributed architecture, a URL allocating algorithm, and a method to assure system scalability and dynamic reconfigurability. Simulation experiment shows that load balance, scalability and efficiency can be achieved in the system. Currently this distributed Web-crawling subsystem has been successfully integrated with WebGather, a well-known Chinese and English Web search engine, aimed at collecting all the Web pages in China and keeping pace with the rapid growth of Chinese Web information. In addition, we believe that the design can also be useful in other context such as digital library, etc.