Efficient management of parallelism in object-oriented numerical software libraries
Modern software tools for scientific computing
Terascale spectral element algorithms and implementations
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
A Media-Enhanced Vector Architecture for Embedded Memory Systems
A Media-Enhanced Vector Architecture for Embedded Memory Systems
Cross-architecture performance predictions for scientific applications using parameterized models
Proceedings of the joint international conference on Measurement and modeling of computer systems
Analysis and Modeling of Advanced PIM Architecture Design Tradeoffs
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Pin: building customized program analysis tools with dynamic instrumentation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
A performance prediction framework for scientific applications
Future Generation Computer Systems
HPCTOOLKIT: tools for performance analysis of optimized parallel programs http://hpctoolkit.org
Concurrency and Computation: Practice & Experience - Scalable Tools for High-End Computing
Communications of the ACM
A tool for characterizing and succinctly representing the data access patterns of applications
IISWC '11 Proceedings of the 2011 IEEE International Symposium on Workload Characterization
Hi-index | 0.00 |
We use a hybrid methodology based on binary instrumentation and performance counters to characterize a set of proxy applications (mini-apps and PETSc applications) representative of a broad range of scientific applications (and particularly DOE's future high performance computing workloads). From this empirical basis, we create statistical models that extrapolate application properties (instruction mix, memory size, and memory bandwidth) as a function of problem size. We validate them and project the first quantitative characterization of an exascale computing workload. Finally, the exascale workload is used to evaluate a radical new exascale architecture, stacked DRAM with processor under memory (PUM). Of the two projections, one shows major potential benefits in using PUM. However, the second, more conservative projection suggests that only a small number of exascale applications are likely to be memory-bandwidth limited, but even these are fundamentally memory-capacity limited.