The Evolution of Link-Attributes for Pages and Its Implications on Web Crawling

  • Authors:
  • Tao Meng;Hongfei Yan;Jimin Wang;Xiaoming Li

  • Affiliations:
  • Peking University, Beijing, China;Peking University, Beijing, China;Peking University, Beijing, China;Peking University, Beijing, China

  • Venue:
  • WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

It is important for an incremental crawler to know how web pages evolve and the relation between their changing frequencies and the link-attributes such as indegrees. This paper proposes a model for incremental crawling and performs an experiment to verify the correlation between them, by monitoring the evolution of all the link-attributes of the web pages within one website. Particularly, we look deeply into one special kind of page named Index-pages. From the experiment, we can make four conclusions: (1) Pages which have bigger indegrees, outdegrees or PageRank values change more often, and these link-attributes all approximately obey a power-law distribution. (2) The link-attributes of pages seldom change though the pages change themselves. (3) A small proportion of the pages link to most of the vertexes in the web graph. (4) The Index-pages link to sizeable new pages in a website. These conclusions can be used to greatly enhance the performance of an incremental crawler, which is the foremost component for general search engines and web information stores.