Wide area traffic: the failure of Poisson modeling
IEEE/ACM Transactions on Networking (TON)
Internet Web servers: workload characterization and performance implications
IEEE/ACM Transactions on Networking (TON)
Cache miss equations: a compiler framework for analyzing and tuning memory behavior
ACM Transactions on Programming Languages and Systems (TOPLAS)
ACM Computing Surveys (CSUR)
Properties of the working-set model
Communications of the ACM
The working set model for program behavior
Communications of the ACM
Estimating cache misses and locality using stack distances
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Properties and applications of the least-recently-used stack model
Properties and applications of the least-recently-used stack model
Architecture Independent Performance Characterization and Benchmarking for Scientific Applications
MASCOTS '04 Proceedings of the The IEEE Computer Society's 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
Quantifying Locality In The Memory Access Patterns of HPC Applications
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
A performance prediction framework for scientific applications
ICCS'03 Proceedings of the 2003 international conference on Computational science: PartIII
Hi-index | 0.01 |
This paper describes a methodology for a very efficient characterization of a workload's memory access properties using the least recently used (LRU) replacement policy. The resulting access reuse profile captures working set sizes of a workload and can be used to characterize the amount of locality of data references and predict its general caching behavior The approach discussed in this paper is flexible and can be used in conjunction with tracing or execution-driven techniques. Because of the efficiency of the proposed algorithm – processing over one million memory accesses per second – the LRU profiles can be collected for a large number of workloads and the resulting data can be used in early stages of computer system design We illustrate the method with data collected for NAS Parallel Benchmarks. For selected benchmarks we compare the miss rate profiles for various sizes of the workload. We also compare the resulting LRU profiles with point predictions of miss rates generated with conventional cache simulations and observe a good match. In the concluding part of the paper we report the performance results for the proposed method