Memory Trace Compression and Replay for SPMD Systems using Extended PRSDs?

Authors:
Sandeep Budanur;Frank Mueller;Todd Gamblin
Affiliations:
North Carolina State University, Raleigh, NC, USA;North Carolina State University, Raleigh, NC, USA;Lawrence Livermore National Laboratory, Livermore, CA, USA
Venue:
ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
Year:
2011

Citing 24
Cited 2

RATCHET: real-time address trace compression hardware for extended traces

ACM SIGMETRICS Performance Evaluation Review
Analyzing scheduling policies using Dimemas

Parallel Computing - Special double issue on environment and tools for parallel scientific computing
Address trace compression through loop detection and reduction

SIGMETRICS '99 Proceedings of the 1999 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Efficient representations and abstractions for quantifying and exploiting data reference locality

Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Dynamic hot data stream prefetching for general-purpose programs

PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Automatically characterizing large scale program behavior

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
An Implementation of Interprocedural Bounded Regular Section Analysis

IEEE Transactions on Parallel and Distributed Systems
Accuracy and Speedup of Parallel Trace-Driven Architectural Simulation

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Validation of Dimemas Communication Model for MPI Collective Operations

Proceedings of the 7th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
SIGMA: a simulator infrastructure to guide memory analysis

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
A framework for performance modeling and prediction

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
METRIC: tracking down inefficiencies in the memory hierarchy via binary rewriting

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
VPC3: a fast and effective trace-compression algorithm

Proceedings of the joint international conference on Measurement and modeling of computer systems
Detailed cache coherence characterization for OpenMP benchmarks

Proceedings of the 18th annual international conference on Supercomputing
Pin: building customized program analysis tools with dynamic instrumentation

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
A hybrid hardware/software approach to efficiently determine cache coherence Bottlenecks

Proceedings of the 19th annual international conference on Supercomputing
Quantifying Locality In The Memory Access Patterns of HPC Applications

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Analysis of cache-coherence bottlenecks with hybrid hardware/software techniques

ACM Transactions on Architecture and Code Optimization (TACO)
The structural simulation toolkit: exploring novel architectures

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
METRIC: Memory tracing via dynamic binary rewriting to identify cache inefficiencies

ACM Transactions on Programming Languages and Systems (TOPLAS)
Source-Code-Correlated Cache Coherence Characterization of OpenMP Benchmarks

IEEE Transactions on Parallel and Distributed Systems
ScalaTrace: Scalable compression and replay of communication traces for high-performance computing

Journal of Parallel and Distributed Computing
Scalable I/O tracing and analysis

Proceedings of the 4th Annual Workshop on Petascale Data Storage
Performance modeling: understanding the past and predicting the future

Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing

WMTools - assessing parallel application memory utilisation at scale

EPEW'11 Proceedings of the 8th European conference on Computer Performance Engineering
Elastic and scalable tracing and accurate replay of non-deterministic events

Proceedings of the 27th international ACM conference on International conference on supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Concurrency levels in large-scale supercomputers are rising exponentially, and shared-memory nodes with hundreds of cores and non-uniform memory access latencies are expected within the next decade. However, even current petascale systems with tens of cores per node suffer from memory bottlenecks. As core counts increase, memory issues become critical for the performance of large-scale supercomputers. Trace analysis tools are vital for diagnosing the root causes of memory problems. However, existing tools are expensive due to prohibitively large trace sizes, or they collect only statistical summaries that omit valuable information. In this paper, we present ScalaMemTrace, a novel technique for collecting memory traces in a scalable manner. ScalaMemTrace builds on prior trace methods with aggressive compression techniques to allow lossless representation of memory traces for dense algebraic kernels, with nearconstant trace size irrespective of the problem size or the number of threads. We further introduce a replay mechanism for ScalaMemTrace traces, and discuss the results of our prototype implementation on the x86 64 architecture.