An empirical study on the change of web pages

  • Authors:
  • Sung Jin Kim;Sang Ho Lee

  • Affiliations:
  • School of Computer Science and Engineering, Seoul National University, Seoul, Korea;School of Computing, Soongsil University, Seoul, Korea

  • Venue:
  • APWeb'05 Proceedings of the 7th Asia-Pacific web conference on Web Technologies Research and Development
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

As web pages are created, destroyed, and updated dynamically, web databases should be frequently updated to keep web pages up-to-date. Understanding the change behavior of web pages certainly helps the administrators manage their web databases. This paper introduces a number of metrics representing various change behavior of the web pages. We have monitored approximately 1.8 million to three million URLs at two-day intervals for 100 days. Using the metrics we propose, we analyze the collected URLs and web pages. In addition, we propose a method that computes the probability that a page will be downloaded on the next crawls.