The measurement of locality and the behaviour of programs
The Computer Journal
Cache coherence protocols: evaluation using a multiprocessor simulation model
ACM Transactions on Computer Systems (TOCS)
Benchmark Synthesis Using the LRU Cache Hit Function
IEEE Transactions on Computers
Accurate low-cost methods for performance evaluation of cache memory systems
Accurate low-cost methods for performance evaluation of cache memory systems
ACM Transactions on Computer Systems (TOCS)
IEEE Transactions on Computers
Implementing stack simulation for highly-associative memories
SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Synthetic Traces for Trace-Driven Simulation of Cache Memories
IEEE Transactions on Computers
On the accuracy of memory reference models
Proceedings of the 7th international conference on Computer performance evaluation : modelling techniques and tools: modelling techniques and tools
Multi-configuration simulation algorithms for the evaluation of computer architecture designs
Multi-configuration simulation algorithms for the evaluation of computer architecture designs
Locality As a Visualization Tool
IEEE Transactions on Computers
Principles of Optimal Page Replacement
Journal of the ACM (JACM)
Properties of the working-set model
Communications of the ACM
Program Behavior: Models and Measurements
Program Behavior: Models and Measurements
Predictive performance and scalability modeling of a large-scale application
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
A framework for performance modeling and prediction
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Calculating stack distances efficiently
Proceedings of the 2002 workshop on Memory system performance
Predicting whole-program locality through reuse distance analysis
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Miss Rate Prediction across All Program Inputs
Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
Cross-architecture performance predictions for scientific applications using parameterized models
Proceedings of the joint international conference on Measurement and modeling of computer systems
From the fractal dimension of the intermiss gaps to the cache-miss ratio
IBM Journal of Research and Development - Q-Coder adaptive binary arithmetic coder
Fast data-locality profiling of native execution
SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Pin: building customized program analysis tools with dynamic instrumentation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Cache characterization and performance studies using locality surfaces
Cache characterization and performance studies using locality surfaces
How Well Can Simple Metrics Represent the Performance of HPC Applications?
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Apex-Map: A Global Data Access Benchmark to Analyze HPC Systems and Parallel Programming Paradigms
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Quantifying Locality In The Memory Access Patterns of HPC Applications
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Cache characterization surfaces and predicting workload miss rates
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
StatCache: a probabilistic approach to efficient and accurate data locality analysis
ISPASS '04 Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software
Synthetic Trace-Driven Simulation of Cache Memory
AINAW '07 Proceedings of the 21st International Conference on Advanced Information Networking and Applications Workshops - Volume 01
An adaptive mesh refinement benchmark for modern parallel programming languages
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
A genetic algorithms approach to modeling the performance of memory-bound computations
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Symbiotic space-sharing on SDSC's datastar system
JSSPP'06 Proceedings of the 12th international conference on Job scheduling strategies for parallel processing
Survey of scheduling techniques for addressing shared resources in multicore processors
ACM Computing Surveys (CSUR)
Hi-index | 0.01 |
Though the performance of many scientific codes is dominated by memory behavior, our ability to describe, capture, compare, and recreate that behavior is quite limited. This inability underlies much of the complexity in the field of performance analysis: it is fundamentally difficult to relate benchmarks and applications or use realistic workloads to guide system design and procurement. An observable, reproducible, and machine-independent memory characterization is needed. The Chameleon framework is a software suite that includes tools to capture a concise, machine-independent memory signature from any application and produce synthetic memory address traces that mimic that signature. By simultaneously modeling both spatial and temporal locality, Chameleon produces uniquely accurate, general-purpose synthetic traces. Our results demonstrate that the cache hit rates generated by each synthetic trace are nearly identical to those of the application it targets on dozens of memory hierarchies representing many of today's commercial offerings. We apply the framework to high-performance computing (HPC) by leveraging sampling techniques to capture the memory signatures of full-scale, parallel applications with only a 5x slowdown. The overall result is therefore a concise, observable, and machine-independent representation of the memory requirements of full-scale applications that can be tractably captured and accurately mimicked.