Looking at both the present and the past to efficiently update replicas of web content

  • Authors:
  • Luciano Barbosa;Ana Carolina Salgado;Francisco de Carvalho;Jacques Robin;Juliana Freire

  • Affiliations:
  • University of Utah;Universidade Federal de Pernambuco;Universidade Federal de Pernambuco;Universidade Federal de Pernambuco;University of Utah

  • Venue:
  • Proceedings of the 7th annual ACM international workshop on Web information and data management
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Since Web sites are autonomous and independently updated, applications that keep replicas of Web data, such as Web warehouses and search engines, must periodically poll the sites and check for changes.Since this is a resource-intensive task, in order to keep the copies up-to-date, it is important to devise efficient update schedules that adapt to the change rate of the pages and avoid visiting pages not modified since the last visit.In this paper, we propose a new approach that learns to predict the change behavior of Web pages based both on the static features and change history of pages, and refreshes the copies accordingly.Experiments using real-world data show that our technique leads to substantial performance improvements compared to previously proposed approaches.