Archival HTTP redirection retrieval policies

Authors:
Ahmed AlSum;Michael L. Nelson;Robert Sanderson;Herbert Van de Sompel
Affiliations:
Old Dominion University, Norfolk, VA, USA;Old Dominion University, Norfolk, VA, USA;Los Alamos National Laboratory, Los Alamos, NM, USA;Los Alamos National Laboratory, Los Alamos, NM, USA
Venue:
Proceedings of the 22nd international conference on World Wide Web companion
Year:
2013

Citing 16
Cited 0

Efficient crawling through URL ordering

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Digital object identifiers

Information Services and Use - Special issue on ICSTI/CODATA/ICSU seminar on preserving the record of science
Effective page refresh policies for Web crawlers

ACM Transactions on Database Systems (TODS)
What's new on the web?: the evolution of the web from a search engine perspective

Proceedings of the 13th international conference on World Wide Web
A large-scale study of the evolution of web pages

Software—Practice & Experience - Special issue: Web technologies
Crawling a country: better strategies than breadth-first for web page ordering

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Web Archiving

Web Archiving
Archiving Websites: A Practical Guide for Information Management Professionals

Archiving Websites: A Practical Guide for Information Management Professionals
Visualizing historical content of web pages

Proceedings of the 17th international conference on World Wide Web
What can history tell us?: towards different models of interaction with document histories

Proceedings of the nineteenth ACM conference on Hypertext and hypermedia
Zoetrope: interacting with the ephemeral web

Proceedings of the 21st annual ACM symposium on User interface software and technology
Changing how people view changes on the web

Proceedings of the 22nd annual ACM symposium on User interface software and technology
we.b: the web of short urls

Proceedings of the 20th international conference on World wide web
Archiving the web using page changes patterns: a case study

Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
How much of the web is archived?

Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Improving the quality of web archives through the importance of changes

DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

When retrieving archived copies of web resources (mementos) from web archives, the original resource's URI-R is typically used as the lookup key in the web archive. This is straightforward until the resource on the live web issues a redirect: R -R`. Then it is not clear if R or R` should be used as the lookup key to the web archive. In this paper, we report on a quantitative study to evaluate a set of policies to help the client discover the correct memento when faced with redirection. We studied the stability of 10,000 resources and found that 48% of the sample URIs tested were not stable, with respect to their status and redirection location. 27% of the resources were not perfectly reliable in terms of the number of mementos of successful responses over the total number of mementos, and 2% had a reliability score of less than 0.5. We tested two retrieval policies. The first policy covered the resources which currently issue redirects and successfully resolved 17 out of 77 URIs that did not have mementos of the original URI, but did of the resource that was being redirected to. The second policy covered archived copies with HTTP redirection and helped the client in 58% of the cases tested to discover the nearest memento to the requested datetime.