Characterizing browsing strategies in the World-Wide Web
Proceedings of the Third International World-Wide Web conference on Technology, tools and applications
Web server workload characterization: the search for invariants
Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Internet Web servers: workload characterization and performance implications
IEEE/ACM Transactions on Networking (TON)
Self-similarity in World Wide Web traffic: evidence and possible causes
IEEE/ACM Transactions on Networking (TON)
On the scale and performance of cooperative Web proxy caching
Proceedings of the seventeenth ACM symposium on Operating systems principles
Characteristics of WWW Client-based Traces
Characteristics of WWW Client-based Traces
Efficient Analysis of Caching Systems
Efficient Analysis of Caching Systems
Evaluation techniques for storage hierarchies
IBM Systems Journal
IBM Journal of Research and Development
Optimal Web cache sizing: scalable methods for exact solutions
Computer Communications
Squeezing more bits out of HTTP caches
IEEE Network: The Magazine of Global Internetworking
Aliasing on the world wide web: prevalence and performance implications
Proceedings of the 11th international conference on World Wide Web
Web tap: detecting covert web traffic
Proceedings of the 11th ACM conference on Computer and communications security
Design, implementation, and evaluation of duplicate transfer detection in HTTP
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Hi-index | 0.24 |
This paper describes a new technique for measuring Web client request patterns and analyzes a large client trace collected using the new method. In this approach, a modified proxy intercepts requests and serves all responses to clients marked uncacheable, effectively disabling browser caches and allowing the proxy to record requests that would otherwise result in silent browser cache hits. WebTV Networks used a 'cache-busting proxy' to collect an unusually large and detailed anonymized Web client trace in September 2000. It contains over 347 million requests for over 36 million documents by over 37,000 clients and spans 16 days. By most measures, it is two orders of magnitude larger than existing Web client traces. We compare cache-busting proxies with conventional client instrumentation and use the WebTV trace to explore browser cache performance, reference locality, and document aliasing. We present the aggregate browser cache success function (hit rate vs. cache size) of the entire client population and discuss design implications for memory- and bandwidth-constrained Web clients. For the workload studied, eliminating redundant data transfers would increase browser cache hit rates by 35-45% over their current levels. A simple and practical technique for eliminating redundant transfers is described. Document sharing across client reference streams is so strong that the hit rate of a shared proxy cache could exceed 57% even if browser caches were infinitely large.