ScalaTrace: tracing, analysis and modeling of HPC codes at scale

Authors:
Frank Mueller;Xing Wu;Martin Schulz;Bronis R. de Supinski;Todd Gamblin
Affiliations:
Dept. of Computer Science, North Carolina State University, Raleigh, NC;Dept. of Computer Science, North Carolina State University, Raleigh, NC;Lawrence Livermore National Laboratory, Center for Applied Scientific Computing, Livermore, CA;Lawrence Livermore National Laboratory, Center for Applied Scientific Computing, Livermore, CA;Lawrence Livermore National Laboratory, Center for Applied Scientific Computing, Livermore, CA
Venue:
PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2
Year:
2010

Citing 12
Cited 0

Architectural requirements and scalability of the NAS parallel benchmarks

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Statistical scalability analysis of communication operations in distributed applications

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Predictive performance and scalability modeling of a large-scale application

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
An Implementation of Interprocedural Bounded Regular Section Analysis

IEEE Transactions on Parallel and Distributed Systems
Performance Optimization for Large Scale Computing: The Scalable VAMPIR Approach

ICCS '01 Proceedings of the International Conference on Computational Science-Part II
METRIC: tracking down inefficiencies in the memory hierarchy via binary rewriting

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Cross-architecture performance predictions for scientific applications using parameterized models

Proceedings of the joint international conference on Measurement and modeling of computer systems
Improved automatic testcase synthesis for performance model validation

Proceedings of the 19th annual international conference on Supercomputing
Preserving time in large-scale communication traces

Proceedings of the 22nd annual international conference on Supercomputing
ScalaTrace: Scalable compression and replay of communication traces for high-performance computing

Journal of Parallel and Distributed Computing
Scalable I/O tracing and analysis

Proceedings of the 4th Annual Workshop on Petascale Data Storage
ScalaExtrap: trace-based communication extrapolation for spmd programs

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

Characterizing the communication behavior of large-scale applications is a difficult and costly task due to code/system complexity and their long execution times. An alternative to running actual codes is to gather their communication traces and then replay them, which facilitates application tuning and future procurements. While past approaches lacked lossless scalable trace collection, we contribute an approach that provides orders of magnitude smaller, if not near constant-size, communication traces regardless of the number of nodes while preserving structural information. We introduce intra- and inter-node compression techniques of MPI events, we develop a scheme to preserve time and causality of communication events, and we present results of our implementation for BlueGene/L. Given this novel capability, we discuss its impact on communication tuning and on trace extrapolation. To the best of our knowledge, such a concise representation of MPI traces in a scalable manner combined with time-preserving deterministic MPI call replay are without any precedence.