A survey of web caching schemes for the Internet
ACM SIGCOMM Computer Communication Review
Keeping Up with the Changing Web
Computer
The Evolution of the Web and Implications for an Incremental Crawler
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
What's new on the web?: the evolution of the web from a search engine perspective
Proceedings of the 13th international conference on World Wide Web
A large-scale study of the evolution of web pages
Software—Practice & Experience - Special issue: Web technologies
Squid: The Definitive Guide
Lazy preservation: reconstructing websites by crawling the crawlers
WIDM '06 Proceedings of the 8th annual ACM international workshop on Web information and data management
Improving web server performance by caching dynamic data
USITS'97 Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems
Rate of change and other metrics: a live study of the world wide web
USITS'97 Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems
Architecture of the internet archive
SYSTOR '09 Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference
Active cache: caching dynamic contents on the Web
Middleware '98 Proceedings of the IFIP International Conference on Distributed Systems Platforms and Open Distributed Processing
How much of the web is archived?
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
Hi-index | 0.00 |
As defined by the Memento Framework, TimeMaps are machine-readable lists of time-specific copies -- called "mementos" -- of an archived original resource. In theory, as an archive acquires additional mementos over time, a TimeMap should be monotonically increasing. However, there are reasons why the number of mementos in a TimeMap would decrease, for example: archival redaction of some or all of the mementos, archival restructuring, and transient errors of one or more archives. We study TimeMaps for 4,000 original resources over a three month period, note their change patterns, and develop a caching algorithm for TimeMaps suitable for a reverse proxy in front of a Memento aggregator. We show that TimeMap cardinality is constant or monotonically increasing for 80.2% of all TimeMap downloads in the observation period. The goal of the caching algorithm is to exploit the ideally monotonically increasing nature of TimeMaps and not cache responses with fewer mementos than the already cached TimeMap. This new caching algorithm uses conditional cache replacement and a Time To Live (TTL) value to ensure the user has access to the most complete TimeMap available. Based on our empirical data, a TTL of 15 days will minimize the number of mementos missed by users, and minimize the load on archives contributing to TimeMaps.