A pocket guide to web history

  • Authors:
  • Klaus Berberich;Srikanta Bedathur;Gerhard Weikum

  • Affiliations:
  • Max-Planck Institute for Informatics, Saarbrücken, Germany;Max-Planck Institute for Informatics, Saarbrücken, Germany;Max-Planck Institute for Informatics, Saarbrücken, Germany

  • Venue:
  • SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Web archives like the Internet Archive preserve the evolutionary history of large portions of the Web. Access to them, however, is still via rather limited interfaces - a search functionality is often missing or ignores the time axis. Time-travel search alleviates this shortcoming by enriching keyword queries with a time-context of interest. In order to be effective, time-travel queries require historical PageRank scores. In this paper, we address this requirement and propose rank synopses as a novel structure to compactly represent and reconstruct historical PageRank scores. Rank synopses can reconstruct the PageRank score of a web page as of any point during its lifetime, even in the absence of a snapshot of the Web as of that time. We further devise a normalization scheme for PageRank scores to make them comparable across different graphs. Through a comprehensive evaluation over different datasets, we demonstrate the accuracy and space-economy of the proposed methods.