The measurement of locality and the behaviour of programs
The Computer Journal
Memory access patterns of parallel scientific programs
SIGMETRICS '87 Proceedings of the 1987 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
Trace-driven memory simulation: a survey
ACM Computing Surveys (CSUR)
Compiler-directed data prefetching in multiprocessors with memory hierarchies
ICS '90 Proceedings of the 4th international conference on Supercomputing
Run-time spatial locality detection and optimization
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Performance characterization of a Quad Pentium Pro SMP using OLTP workloads
Proceedings of the 25th annual international symposium on Computer architecture
Exploiting spatial locality in data caches using spatial footprints
Proceedings of the 25th annual international symposium on Computer architecture
Cache miss equations: a compiler framework for analyzing and tuning memory behavior
ACM Transactions on Programming Languages and Systems (TOPLAS)
ACM Computing Surveys (CSUR)
Properties of the working-set model
Communications of the ACM
The working set model for program behavior
Communications of the ACM
False Sharing and Spatial Locality in Multiprocessor Caches
IEEE Transactions on Computers
Predicting whole-program locality through reuse distance analysis
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
A measure of program locality and its application
SIGMETRICS '84 Proceedings of the 1984 ACM SIGMETRICS conference on Measurement and modeling of computer systems
The Memory Performance of DSS Commercial Workloads in Shared-Memory Multiprocessors
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Simulation Analysis Data Sharing in Shared Memory Multiprocessors
Simulation Analysis Data Sharing in Shared Memory Multiprocessors
Architecture Independent Performance Characterization and Benchmarking for Scientific Applications
MASCOTS '04 Proceedings of the The IEEE Computer Society's 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
Effective Instruction Prefetching in Chip Multiprocessors for Modern Commercial Applications
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Quantifying Locality In The Memory Access Patterns of HPC Applications
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
A case study in top-down performance estimation for a large-scale parallel application
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Scientific applications vs. SPEC-FP: a comparison of program behavior
Proceedings of the 20th annual international conference on Supercomputing
The HPC Challenge (HPCC) benchmark suite
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
A performance prediction framework for scientific applications
ICCS'03 Proceedings of the 2003 international conference on Computational science: PartIII
A scalable, commodity data center network architecture
Proceedings of the ACM SIGCOMM 2008 conference on Data communication
Understanding sources of inefficiency in general-purpose chips
Proceedings of the 37th annual international symposium on Computer architecture
Hi-index | 0.00 |
This paper provides a systematic comparison of various characteristics of computationally-intensive workloads. Our analysis focuses on standard HPC benchmarks and representative applications. For the selected workloads we provide a wide range of characterizations based on instruction tracing and hardware counter measurements. Each workload is analyzed at the instruction level by comparing the dynamic distribution of executed instructions. We also analyze memory access patterns including various aspects of cache utilization and locality properties of address distributions. Since prefetching plays an important role in the performance of computational workloads, we explore the prefetching potential and for parallel workloads we study the sharing properties of memory accesses. For the purpose of completeness, HPC workloads are compared to two commonly used commercial computing benchmarks. The results of this work show that the HPC application space is surprisingly diverse, with some codes showing similar data sharing and locality properties with commercial applications. The wide range of studies presented in this paper are instrumental in uncovering the diversity of this application space.