Estimating the rate of web page updates

  • Authors:
  • Sanasam Ranbir Singh

  • Affiliations:
  • Indian Institute of Technology Madras, Department of Computer Science and Engineering, Chennai, India

  • Venue:
  • IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Estimating the rate of Web page updates helps in improving the Web crawler's scheduling policy. But, most of the Web sources are autonomous and updated independently. Clients like Web crawlers are not aware of when and how often the sources change. Unlike other studies, the process of Web page updates is modeled as nonhomogeneous Poisson process and focus on determining localized rate of updates. Then various rate estimators are discussed, showing experimentally how precise they are. This paper explores two classes of problems. Firstly the localized rate of updates is estimated by dividing the given sequence of independent and inconsistent update points into consistent windows. From various experimental comparisons, the proposed Weibull estimator outperforms Duane plot(another proposed estimator) and other estimators proposed by Cho et al. and Norman Matloff in 91.5%(90.6%) of the whole windows for synthetic(real Web) datasets. Secondly, the future update points are predicted based on most recent window and it is found that Weibull estimator has higher precision compared to other estimators.