Working set characterization of applications with an efficient LRU algorithm

Authors:
Lodewijk Bonebakker;Andrew Over;Ilya Sharapov
Affiliations:
Sun Microsystems Laboratories;Australian National University;Sun Microsystems Laboratories
Venue:
EPEW'06 Proceedings of the Third European conference on Formal Methods and Stochastic Models for Performance Evaluation
Year:
2006

Citing 11
Cited 0

Wide area traffic: the failure of Poisson modeling

IEEE/ACM Transactions on Networking (TON)
Internet Web servers: workload characterization and performance implications

IEEE/ACM Transactions on Networking (TON)
Cache miss equations: a compiler framework for analyzing and tuning memory behavior

ACM Transactions on Programming Languages and Systems (TOPLAS)
Cache Memories

ACM Computing Surveys (CSUR)
Properties of the working-set model

Communications of the ACM
The working set model for program behavior

Communications of the ACM
Estimating cache misses and locality using stack distances

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Properties and applications of the least-recently-used stack model

Properties and applications of the least-recently-used stack model
Architecture Independent Performance Characterization and Benchmarking for Scientific Applications

MASCOTS '04 Proceedings of the The IEEE Computer Society's 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
Quantifying Locality In The Memory Access Patterns of HPC Applications

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
A performance prediction framework for scientific applications

ICCS'03 Proceedings of the 2003 international conference on Computational science: PartIII

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper describes a methodology for a very efficient characterization of a workload's memory access properties using the least recently used (LRU) replacement policy. The resulting access reuse profile captures working set sizes of a workload and can be used to characterize the amount of locality of data references and predict its general caching behavior The approach discussed in this paper is flexible and can be used in conjunction with tracing or execution-driven techniques. Because of the efficiency of the proposed algorithm – processing over one million memory accesses per second – the LRU profiles can be collected for a large number of workloads and the resulting data can be used in early stages of computer system design We illustrate the method with data collected for NAS Parallel Benchmarks. For selected benchmarks we compare the miss rate profiles for various sizes of the workload. We also compare the resulting LRU profiles with point predictions of miss rates generated with conventional cache simulations and observe a good match. In the concluding part of the paper we report the performance results for the proposed method