Elastic and scalable tracing and accurate replay of non-deterministic events

Authors:
Xing Wu;Frank Mueller
Affiliations:
NCSU, Raleigh, NC, USA;NCSU, Raleigh, NC, USA
Venue:
Proceedings of the 27th international ACM conference on International conference on supercomputing
Year:
2013

Citing 31
Cited 0

Address trace compression through loop detection and reduction

SIGMETRICS '99 Proceedings of the 1999 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Whole program paths

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Statistical scalability analysis of communication operations in distributed applications

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
An Implementation of Interprocedural Bounded Regular Section Analysis

IEEE Transactions on Parallel and Distributed Systems
SIGMA: a simulator infrastructure to guide memory analysis

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
METRIC: tracking down inefficiencies in the memory hierarchy via binary rewriting

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Linear-Time, Incremental Hierarchy Inference for Compression

DCC '97 Proceedings of the Conference on Data Compression
Gprof: A call graph execution profiler

SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
VPC3: a fast and effective trace-compression algorithm

Proceedings of the joint international conference on Measurement and modeling of computer systems
Construction and Compression of Complete Call Graphs for Post-Mortem Program Trace Analysis

ICPP '05 Proceedings of the 2005 International Conference on Parallel Processing
Performance and Scalability Analysis of Teraflop-Scale Parallel Architectures Using Multidimensional Wavefront Applications

International Journal of High Performance Computing Applications
A hybrid hardware/software approach to efficiently determine cache coherence Bottlenecks

Proceedings of the 19th annual international conference on Supercomputing
Low-overhead call path profiling of unmodified, optimized code

Proceedings of the 19th annual international conference on Supercomputing
The Tau Parallel Performance System

International Journal of High Performance Computing Applications
METRIC: Memory tracing via dynamic binary rewriting to identify cache inefficiencies

ACM Transactions on Programming Languages and Systems (TOPLAS)
Preserving time in large-scale communication traces

Proceedings of the 22nd annual international conference on Supercomputing
Scalable load-balance measurement for SPMD codes

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
MPIWiz: subgroup reproducible replay of mpi applications

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
ScalaTrace: Scalable compression and replay of communication traces for high-performance computing

Journal of Parallel and Distributed Computing
FACT: fast communication trace collection for parallel applications through program slicing

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
PHANTOM: predicting performance of parallel applications on large-scale parallel machines using a single node

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
HPCTOOLKIT: tools for performance analysis of optimized parallel programs http://hpctoolkit.org

Concurrency and Computation: Practice & Experience - Scalable Tools for High-End Computing
Construction and evaluation of coordinated performance skeletons

HiPC'08 Proceedings of the 15th international conference on High performance computing
Clustering performance data efficiently at massive scales

Proceedings of the 24th ACM International Conference on Supercomputing
Scalable Communication Trace Compression

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
ScalaExtrap: trace-based communication extrapolation for spmd programs

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Memory Trace Compression and Replay for SPMD Systems using Extended PRSDs?

ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
Automatic generation of executable communication specifications from parallel applications

Proceedings of the international conference on Supercomputing
Probabilistic Communication and I/O Tracing with Deterministic Replay at Scale

ICPP '11 Proceedings of the 2011 International Conference on Parallel Processing
Introducing the open trace format (OTF)

ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part II
ScalaBenchGen: Auto-Generation of Communication Benchmarks Traces

IPDPS '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium

Quantified Score

Hi-index	0.00

Visualization

Abstract

SCALATRACE represents the state-of-the-art of parallel application tracing for high performance computing (HPC). This paper presents SCALATRACE II, a next generation tracer that delivers even higher trace compression capability, even when events are not always regular. In this work, we contribute a spectrum of novel compression and replay techniques that are fundamentally different from our past approaches. SCALATRACE II features a redesigned low-level encoding scheme of trace data such that data elements are elastic and self explanatory. With this new encoding scheme, trace compression is enhanced by introducing innovative intra-node and inter-node trace compression algorithms that guarantee high compression rates in a loop structure agnostic fashion. In practice, the improved compression scheme is particularly efficient for scientific codes that demonstrate inconsistent behavior across time steps and nodes. A novel approach is further contributed to probabilistically replay sequences of non-deterministic events. To assess the compression efficacy of SCALATRACE II, we conduct experiments not only with computational kernels but also a real-world application, the Parallel Ocean Program (POP). Compared to the first generation SCALATRACE, we observe key improvements on trace compression for benchmarks with inconsistent time step behavior and diverging task level behavior while retaining timing accuracy even under probabilistic replay.