The Implementation of a Web Crawler URL Filter Algorithm Based on Caching

  • Authors:
  • Wang Hui-chang;Ruan Shu-hua;Tang Qi-jie

  • Affiliations:
  • -;-;-

  • Venue:
  • IWCSE '09 Proceedings of the 2009 Second International Workshop on Computer Science and Engineering - Volume 02
  • Year:
  • 2009

Quantified Score

Hi-index 0.01

Visualization

Abstract

For large-scale Web information collection, the URL filter module plays important roles in a Web crawler which is a central component of a search engine. The performance of an URL filter module influents the efficiency of the entire collection system directly. This paper introduces one URL filter algorithm based on caching and its implementation. The performances of stability and paralleling of the algorithm are verified by the experiments for Websites which handle a large number of web pages. Experiment results show the algorithm proposed in this paper can achieve satisfactory performances through reasonable adjustments of its some parameters and it is suitable for the process of the URL filter of a Website which has a number of page navigator links and index pages especially.