Accurate Low-Cost Methods for Performance Evaluation of Cache Memory Systems
IEEE Transactions on Computers
Mache: no-loss trace compaction
SIGMETRICS '89 Proceedings of the 1989 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Blocking: exploiting spatial locality for trace compaction
SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
A model for estimating trace-sample miss ratios
SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
An algorithm for off-line detection of phases in execution profiles
Proceedings of the 7th international conference on Computer performance evaluation : modelling techniques and tools: modelling techniques and tools
Trace-driven memory simulation: a survey
ACM Computing Surveys (CSUR)
Combining Trace Sampling with Single Pass Methods for Efficient Cache Simulation
IEEE Transactions on Computers
Automatic node selection for high performance applications on networks
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Trace reduction for virtual memory simulations
SIGMETRICS '99 Proceedings of the 1999 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
The AppLeS parameter sweep template: user-level middleware for the grid
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
An efficient profile-analysis framework for data-layout optimizations
POPL '02 Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
A Comparison of Trace-Sampling Techniques for Multi-Megabyte Caches
IEEE Transactions on Computers
IPS-2: The Second Generation of a Parallel Program Measurement System
IEEE Transactions on Parallel and Distributed Systems
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
A framework for performance modeling and prediction
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Compact application signatures for parallel and distributed scientific codes
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Master/Slave Computing on the Grid
HCW '00 Proceedings of the 9th Heterogeneous Computing Workshop
Metascheduling: A Scheduling Model for Metacomputing Systems
HPDC '98 Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing
Characterizing and Predicting Program Behavior and its Variability
Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
Skeleton based performance prediction on shared networks
CCGRID '04 Proceedings of the 2004 IEEE International Symposium on Cluster Computing and the Grid
Modeling application performance by convolving machine signatures with application profiles
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Exploiting stability to reduce time-space cost for memory tracing
ICCS'03 Proceedings of the 2003 international conference on Computational science: PartIII
Automatic Construction and Evaluation of Performance Skeletons
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Performance prediction with skeletons
Cluster Computing
Construction and evaluation of coordinated performance skeletons
HiPC'08 Proceedings of the 15th international conference on High performance computing
Hi-index | 0.00 |
This paper introduces a method to monitor an application and generate a short synthetic "memory skeleton" program whose memory access pattern is representative of the application. In particular, the application and its memory skeleton should have similar cache behavior on any memory hierarchy architecture. The objective is to quickly estimate the cache performance of an application on any memory architecture by running its memory skeleton. The paper presents and validates a framework for automatic construction of memory skeletons. The approach is based on sampling the address trace of an executing application, summarizing it, and then employing it to generate a synthetic memory skeleton program. The broad goal of this research is construction of "performance skeletons" designed to quickly estimate the performance of a large application in an unpredictable environment. A performance skeleton must also mimic the communication and execution behavior of the application. However, the memory behavior drives the performance of many scientific applications and hence memory skeletons are a critical component of this approach to performance estimation.