Modelling information persistence on the web

  • Authors:
  • Daniel Gomes;Mário J. Silva

  • Affiliations:
  • Universidade de Lisboa, Faculdade de Ciências, Portugal;Universidade de Lisboa, Faculdade de Ciências, Portugal

  • Venue:
  • ICWE '06 Proceedings of the 6th international conference on Web engineering
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Models of web data persistency are essential tools for the designof efficient information extraction systems that repeatedlycollect and process the data. This study models the persistence ofweb data through the measurement of URL and content persistenceacross several snapshots of a national community web, collectedfor 3 years. We found that the lifetimes of URLs and contents aremodelled by logarithmic functions. We gathered statistics on thestructure of the web, identified reasons for URL death andcharacterized persistent URLs and contents. The lasting contentstend to be referenced by different URLs during their lifetime,while half of the contents referenced by persistent URLs do notchange.