Statistical scalability analysis of communication operations in distributed applications
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Performance Optimization for Large Scale Computing: The Scalable VAMPIR Approach
ICCS '01 Proceedings of the International Conference on Computational Science-Part II
An overview of the BlueGene/L Supercomputer
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Construction and Compression of Complete Call Graphs for Post-Mortem Program Trace Analysis
ICPP '05 Proceedings of the 2005 International Conference on Parallel Processing
International Journal of High Performance Computing Applications
The Tau Parallel Performance System
International Journal of High Performance Computing Applications
The Design and Implementation of a Domain-Specific Language for Network Performance Testing
IEEE Transactions on Parallel and Distributed Systems
Dispersing proprietary applications as benchmarks through code mutation
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Preserving time in large-scale communication traces
Proceedings of the 22nd annual international conference on Supercomputing
Open | SpeedShop: An open source infrastructure for parallel performance analysis
Scientific Programming - Large-Scale Programming Tools and Environments
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Validity of the single processor approach to achieving large scale computing capabilities
AFIPS '67 (Spring) Proceedings of the April 18-20, 1967, spring joint computer conference
ScalaTrace: Scalable compression and replay of communication traces for high-performance computing
Journal of Parallel and Distributed Computing
FACT: fast communication trace collection for parallel applications through program slicing
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Construction and evaluation of coordinated performance skeletons
HiPC'08 Proceedings of the 15th international conference on High performance computing
A Scalable and Distributed Dynamic Formal Verifier for MPI Programs
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
A compiler-based communication analysis approach for multiprocessor systems
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
ScalaExtrap: trace-based communication extrapolation for spmd programs
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Introducing the open trace format (OTF)
ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part II
Automatic structure extraction from MPI applications tracefiles
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
ScalaExtrap: Trace-based communication extrapolation for SPMD programs
ACM Transactions on Programming Languages and Systems (TOPLAS)
Auto-generation of communication benchmark traces
ACM SIGMETRICS Performance Evaluation Review
Elastic and scalable tracing and accurate replay of non-deterministic events
Proceedings of the 27th international ACM conference on International conference on supercomputing
Hi-index | 0.00 |
Portable parallel benchmarks are widely used and highly effective for (a) the evaluation, analysis and procurement of high-performance computing (HPC) systems and (b) quantifying the potential benefits of porting applications for new hardware platforms. Yet, past techniques to synthetically parametrized hand-coded HPC benchmarks prove insufficient for today's rapidly-evolving scientific codes particularly when subject to multi-scale science modeling or when utilizing domain-specific libraries. To address these problems, this work contributes novel methods to automatically generate highly portable and customizable communication benchmarks from HPC applications. We utilize ScalaTrace, a lossless, yet scalable, parallel application tracing framework to collect selected aspects of the run-time behavior of HPC applications, including communication operations and execution time, while abstracting away the details of the computation proper. We subsequently generate benchmarks with identical run-time behavior from the collected traces. A unique feature of our approach is that we generate benchmarks in CONCEPTUAL, a domain-specific language that enables the expression of sophisticated communication patterns using a rich and easily understandable grammar yet compiles to ordinary C+MPI. Experimental results demonstrate that the generated benchmarks are able to preserve the run-time behavior--including both the communication pattern and the execution time---of the original applications. Such automated benchmark generation is particularly valuable for proprietary, export-controlled, or classified application codes: when supplied to a third party, our auto-generated benchmarks ensure performance fidelity but without the risks associated with releasing the original code. This ability to automatically generate performance-accurate benchmarks from parallel applications is novel and without any precedence, to our knowledge.