Apex-Map: A Global Data Access Benchmark to Analyze HPC Systems and Parallel Programming Paradigms
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Quantifying Locality In The Memory Access Patterns of HPC Applications
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Characteristics of workloads used in high performance and technical computing
Proceedings of the 21st annual international conference on Supercomputing
A genetic algorithms approach to modeling the performance of memory-bound computations
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Working set characterization of applications with an efficient LRU algorithm
EPEW'06 Proceedings of the Third European conference on Formal Methods and Stochastic Models for Performance Evaluation
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Versatile refresh: low complexity refresh scheduling for high-throughput multi-banked eDRAM
Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
Parallel application characterization with quantitative metrics
Concurrency and Computation: Practice & Experience
USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Hi-index | 0.00 |
A simple, tunable, synthetic benchmark with a performance directly related to applications would be of great benefit to the scientific computing community. In this paper, we present a novel approach to develop such a benchmark. The initial focus of this project is on data access performance of scientific applications. First a hardware independent characterization of code performance in terms of address streams is developed. The parameters chosen to characterize a single address stream are related to regularity, size, spatial, and temporal locality. These parameters are then used to implement a synthetic benchmark program that mimics the performance of a corresponding code. To test the validity of our approach we performed experiments using five test kernels on six different platforms. The performance of most of our test kernels can be approximated by a single synthetic address stream. However in some cases overlapping two address streams is necessary to achieve a good approximation.