Usage analysis of a public website reconstruction tool

Authors:
Frank McCown;Michael L. Nelson
Affiliations:
Harding University, Searcy, AR, USA;Old Dominion University, Norfolk, VA, USA
Venue:
Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
Year:
2008

Citing 6
Cited 2

Just-in-time recovery of missing web pages

Proceedings of the seventeenth conference on Hypertext and hypermedia
Evaluation of crawling policies for a web-repository crawler

Proceedings of the seventeenth conference on Hypertext and hypermedia
Lazy preservation: reconstructing websites by crawling the crawlers

WIDM '06 Proceedings of the 8th annual ACM international workshop on Web information and data management
Characterization of national Web domains

ACM Transactions on Internet Technology (TOIT)
Factors affecting website reconstruction from the web infrastructure

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Why web sites are lost (and how they're sometimes found)

Communications of the ACM - Scratch Programming for All

Recovering a website's server components from the web infrastructure

Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
A framework for describing web repositories

Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Web is increasingly the medium by which information is published today, but due to its ephemeral nature, web pages and sometimes entire websites are often "lost" due to server crashes, viruses, hackers, run-ins with the law, bankruptcy and loss of interest. When a website is lost and backups are unavailable, an individual or third party can use Warrick to recover the website from several search engine caches and web archives (the Web Infrastructure). In this short paper, we present Warrick usage data obtained from Brass, a queueing system for Warrick hosted at Old Dominion University and made available to the public for free. Over the last six months, 520 individuals have reconstructed more than 700 websites with 800K resources from the Web Infrastructure. Sixty-two percent of the static web pages were recovered, and 41% of all website resources were recovered. The Internet Archive was the largest contributor of recovered resources (78%).