Efficient crawling through URL ordering
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Information Services and Use - Special issue on ICSTI/CODATA/ICSU seminar on preserving the record of science
Effective page refresh policies for Web crawlers
ACM Transactions on Database Systems (TODS)
What's new on the web?: the evolution of the web from a search engine perspective
Proceedings of the 13th international conference on World Wide Web
A large-scale study of the evolution of web pages
Software—Practice & Experience - Special issue: Web technologies
Crawling a country: better strategies than breadth-first for web page ordering
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Web Archiving
Archiving Websites: A Practical Guide for Information Management Professionals
Archiving Websites: A Practical Guide for Information Management Professionals
Visualizing historical content of web pages
Proceedings of the 17th international conference on World Wide Web
What can history tell us?: towards different models of interaction with document histories
Proceedings of the nineteenth ACM conference on Hypertext and hypermedia
Zoetrope: interacting with the ephemeral web
Proceedings of the 21st annual ACM symposium on User interface software and technology
Changing how people view changes on the web
Proceedings of the 22nd annual ACM symposium on User interface software and technology
Proceedings of the 20th international conference on World wide web
Archiving the web using page changes patterns: a case study
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
How much of the web is archived?
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Improving the quality of web archives through the importance of changes
DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part I
Hi-index | 0.00 |
When retrieving archived copies of web resources (mementos) from web archives, the original resource's URI-R is typically used as the lookup key in the web archive. This is straightforward until the resource on the live web issues a redirect: R -R`. Then it is not clear if R or R` should be used as the lookup key to the web archive. In this paper, we report on a quantitative study to evaluate a set of policies to help the client discover the correct memento when faced with redirection. We studied the stability of 10,000 resources and found that 48% of the sample URIs tested were not stable, with respect to their status and redirection location. 27% of the resources were not perfectly reliable in terms of the number of mementos of successful responses over the total number of mementos, and 2% had a reliability score of less than 0.5. We tested two retrieval policies. The first policy covered the resources which currently issue redirects and successfully resolved 17 out of 77 URIs that did not have mementos of the original URI, but did of the resource that was being redirected to. The second policy covered archived copies with HTTP redirection and helped the client in 58% of the cases tested to discover the nearest memento to the requested datetime.