Copy detection mechanisms for digital documents
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Syntactic clustering of the Web
Selected papers from the sixth international conference on World Wide Web
Mirror, mirror on the Web: a study of host pairs with replicated content
WWW '99 Proceedings of the eighth international conference on World Wide Web
Proxy Cache Algorithms: Design, Implementation, and Performance
IEEE Transactions on Knowledge and Data Engineering
Finding Near-Replicas of Documents and Servers on the Web
WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
Using full reference history for efficient document replacement in web caches
USITS'99 Proceedings of the 2nd conference on USENIX Symposium on Internet Technologies and Systems - Volume 2
Aliasing on the world wide web: prevalence and performance implications
Proceedings of the 11th international conference on World Wide Web
WWW '03 Proceedings of the 12th international conference on World Wide Web
Automatic detection of fragments in dynamically generated web pages
Proceedings of the 13th international conference on World Wide Web
Automatic Fragment Detection in Dynamic Web Pages and Its Impact on Caching
IEEE Transactions on Knowledge and Data Engineering
Performance evaluation of peer-to-peer Web caching systems
Journal of Systems and Software - Special issue: Quality software
Analyzing Document-Duplication Effects on Policies for Browser and Proxy Caching
INFORMS Journal on Computing
Design, implementation, and evaluation of duplicate transfer detection in HTTP
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Supporting practical content-addressable caching with CZIP compression
ATC'07 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference
Hi-index | 0.24 |
A significant percentage of Web objects are replicas. For example, a vast majority of image files such as banners, buttons, and logos are duplicated throughout the WWW. Nevertheless, Web caching systems generally treat the replicas as different objects because they have different URLs. In this paper, we propose a simple and efficient way to manage the replicated objects for Web proxy caches. In the proposed scheme, the MD5 checksum, together with the size of an object, forms an identifier of a Web object that can distinguish replicas. Experimental results show that the proposed scheme significantly improves the cache hit rate and the byte hit rate by removing the redundant objects from the cache and reflecting the popularity of objects more precisely.