A Probabilistic Model for Intelligent Web Crawlers

  • Authors:
  • Ke Hu;Wing Shing Wong

  • Affiliations:
  • -;-

  • Venue:
  • COMPSAC '03 Proceedings of the 27th Annual International Conference on Computer Software and Applications
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

With the enormous growth of the World Wide Web in recentyears, the issue of how to discover web pages efficientlyhas become an important challenge for web crawler designers.In this paper, we will outline a simple model to predictthe distribution of the search depth in a breadth-first searchto reach the first web pages relevant to a user query. Wedefine this probability as the crawler confidence. Recentstudies indicate that at a large scale the Web structure subscribesto power law distribution on several aspects [3][7].However, our work tries to model a microscopic linkagestructure of the Web from an intelligent crawler's point ofview. With the information provided by crawler confidence,an intelligent crawler can adjust its crawling behavior toachieve a higher harvest rate.