A framework for describing web repositories

Authors:
Frank McCown;Michael L. Nelson
Affiliations:
Harding University, Searcy, AR, USA;Old Dominion University, Norfolk, VA, USA
Venue:
Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Year:
2009

Citing 7
Cited 0

Finding Near-Replicas of Documents and Servers on the Web

WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
Spam, damn spam, and statistics: using statistical analysis to locate spam web pages

Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
Random sampling from a search engine's index

Proceedings of the 15th international conference on World Wide Web
Building a research library for the history of the web

Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Lazy preservation: reconstructing websites by crawling the crawlers

WIDM '06 Proceedings of the 8th annual ACM international workshop on Web information and data management
Lazy preservation: reconstructing websites from the web infrastructure

Lazy preservation: reconstructing websites from the web infrastructure
Usage analysis of a public website reconstruction tool

Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries

Quantified Score

Hi-index	0.00

Visualization

Abstract

In prior work we have demonstrated that search engine caches and archiving projects like the Internet Archive's Wayback Machine can be used to "lazily preserve" website and reconstruct them when they are lost. We use the term "web repositories" for collections of automatically refreshed and migrated content, and collectively we refer to these repositories as the "web infrastructure". In this paper we present a framework for describing web repositories and the status of web resources in them. This includes an abstract API for web repository interaction, the concepts of deep vs. flat and light/dark/grey repositories and terminology of describing the recoverability of a web resource. Our API may serve as a foundation for future web repository interfaces.