Just-in-time recovery of missing web pages
Proceedings of the seventeenth conference on Hypertext and hypermedia
Evaluation of crawling policies for a web-repository crawler
Proceedings of the seventeenth conference on Hypertext and hypermedia
Lazy preservation: reconstructing websites by crawling the crawlers
WIDM '06 Proceedings of the 8th annual ACM international workshop on Web information and data management
Characterization of national Web domains
ACM Transactions on Internet Technology (TOIT)
Factors affecting website reconstruction from the web infrastructure
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Why web sites are lost (and how they're sometimes found)
Communications of the ACM - Scratch Programming for All
Recovering a website's server components from the web infrastructure
Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
A framework for describing web repositories
Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Hi-index | 0.00 |
The Web is increasingly the medium by which information is published today, but due to its ephemeral nature, web pages and sometimes entire websites are often "lost" due to server crashes, viruses, hackers, run-ins with the law, bankruptcy and loss of interest. When a website is lost and backups are unavailable, an individual or third party can use Warrick to recover the website from several search engine caches and web archives (the Web Infrastructure). In this short paper, we present Warrick usage data obtained from Brass, a queueing system for Warrick hosted at Old Dominion University and made available to the public for free. Over the last six months, 520 individuals have reconstructed more than 700 websites with 800K resources from the Web Infrastructure. Sixty-two percent of the static web pages were recovered, and 41% of all website resources were recovered. The Internet Archive was the largest contributor of recovered resources (78%).