How Recent is a Web Document?

  • Authors:
  • Bo Hu;Florian Lauck;Jan Scheffczyk

  • Affiliations:
  • Universität der Bundeswehr München, Munich, Germany;Universität der Bundeswehr München, Munich, Germany;Universität der Bundeswehr München, Munich, Germany

  • Venue:
  • Electronic Notes in Theoretical Computer Science (ENTCS)
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

One of the most important aspects of a Web document is its up-to-dateness or recency. Up-to-dateness is particularly relevant to Web documents because they usually contain content origining from different sources and being refreshed at different dates. Whether a Web document is relevant for a reader depends on the history of its contents and so-called external factors, i.e., the up-to-dateness of semantically related documents. In this paper, we approach automatic management of up-to-dateness of Web documents that are managed by an XML-centric Web content management system. First, the freshness for a single document is computed, taking into account its change history. A document metric estimates the distance between different versions of a document. Second, up-to-dateness of a document is determined based on its own history and the historical evolutions of semantically related documents.