Scalable fine-grained call path tracing

Authors:
Nathan R. Tallent;John Mellor-Crummey;Michael Franco;Reed Landrum;Laksono Adhianto
Affiliations:
Rice University, Houston, TX, USA;Rice University, Houston, TX, USA;Rice University, Houston, TX, USA;Stanford University, Stanford, CA, USA;Rice University, Houston, TX, USA
Venue:
Proceedings of the international conference on Supercomputing
Year:
2011

Citing 23
Cited 4

Exploiting hardware performance counters with flow and context sensitive profiling

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
A Dynamic Tracing Mechanism for Performance Analysis of OpenMP Applications

WOMPAT '01 Proceedings of the International Workshop on OpenMP Applications and Tools: OpenMP Shared Memory Parallel Programming
Construction and Compression of Complete Call Graphs for Post-Mortem Program Trace Analysis

ICPP '05 Proceedings of the 2005 International Conference on Parallel Processing
Toward Scalable Performance Visualization with Jumpshot

International Journal of High Performance Computing Applications
Low-overhead call path profiling of unmodified, optimized code

Proceedings of the 19th annual international conference on Supercomputing
The Tau Parallel Performance System

International Journal of High Performance Computing Applications
MPI performance analysis tools on Blue Gene/L

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
An efficient format for nearly constant-time access to arbitrary time intervals in large trace files

Scientific Programming - Large-Scale Programming Tools and Environments
Open | SpeedShop: An open source infrastructure for parallel performance analysis

Scientific Programming - Large-Scale Programming Tools and Environments
Scalable load-balance measurement for SPMD codes

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Effective performance measurement and analysis of multithreaded applications

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Binary analysis for measurement and attribution of program performance

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
ScalaTrace: Scalable compression and replay of communication traces for high-performance computing

Journal of Parallel and Distributed Computing
Automatic detection of parallel applications computation phases

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Diagnosing performance bottlenecks in emerging petascale applications

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Analyzing lock contention in multithreaded applications

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
HPCTOOLKIT: tools for performance analysis of optimized parallel programs http://hpctoolkit.org

Concurrency and Computation: Practice & Experience - Scalable Tools for High-End Computing
The Scalasca performance toolset architecture

Concurrency and Computation: Practice & Experience - Scalable Tools for High-End Computing
A new vision for coarray Fortran

Proceedings of the Third Conference on Partitioned Global Address Space Programing Models
Clustering performance data efficiently at massive scales

Proceedings of the 24th ACM International Conference on Supercomputing
Scalable Identification of Load Imbalance in Parallel Executions Using Call Path Profiles

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Detailed performance analysis using coarse grain sampling

Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
Automatic structure extraction from MPI applications tracefiles

Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing

Quantifying the effectiveness of load balance algorithms

Proceedings of the 26th ACM international conference on Supercomputing
Novel views of performance data to analyze large-scale adaptive applications

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Effective sampling-driven performance tools for GPU-accelerated supercomputers

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Optimizing I/O forwarding techniques for extreme-scale event tracing

Cluster Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Applications must scale well to make efficient use of even medium-scale parallel systems. Because scaling problems are often difficult to diagnose, there is a critical need for scalable tools that guide scientists to the root causes of performance bottlenecks. Although tracing is a powerful performance-analysis technique, tools that employ it can quickly become bottlenecks themselves. Moreover, to obtain actionable performance feedback for modular parallel software systems, it is often necessary to collect and present fine-grained context-sensitive data --- the very thing scalable tools avoid. While existing tracing tools can collect calling contexts, they do so only in a coarse-grained fashion; and no prior tool scalably presents both context- and time-sensitive data. This paper describes how to collect, analyze and present fine-grained call path traces for parallel programs. To scale our measurements, we use asynchronous sampling, whose granularity is controlled by a sampling frequency, and a compact representation. To present traces at multiple levels of abstraction and at arbitrary resolutions, we use sampling to render complementary slices of calling-context-sensitive trace data. Because our techniques are general, they can be used on applications that use different parallel programming models (MPI, OpenMP, PGAS). This work is implemented in HPCToolkit.