Scalable Identification of Load Imbalance in Parallel Executions Using Call Path Profiles

Authors:
Nathan R. Tallent;Laksono Adhianto;John M. Mellor-Crummey
Affiliations:
-;-;-
Venue:
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Year:
2010

Citing 20
Cited 4

Waiting time analysis and performance visualization in Carnival

SPDT '96 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
Exploiting hardware performance counters with flow and context sensitive profiling

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Performance analysis of distributed applications using automatic classification of communication inefficiencies

Proceedings of the 14th international conference on Supercomputing
The Paradyn Parallel Performance Measurement Tool

Computer
MRNet: A Software-Based Multicast/Reduction Network for Scalable Tools

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
MPI performance analysis tools on Blue Gene/L

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Compensation of Measurement Overhead in Parallel Performance Profiling

International Journal of High Performance Computing Applications
Automatic analysis of inefficiency patterns in parallel applications: Research Articles

Concurrency and Computation: Practice & Experience - European–American Working Group on Automatic Performance Analysis (APART)
Performance measurement and analysis of large-scale parallel applications on leadership computing systems

Scientific Programming - Large-Scale Programming Tools and Environments
Scalable load-balance measurement for SPMD codes

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Effective performance measurement and analysis of multithreaded applications

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Binary analysis for measurement and attribution of program performance

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
ScalaTrace: Scalable compression and replay of communication traces for high-performance computing

Journal of Parallel and Distributed Computing
Diagnosing performance bottlenecks in emerging petascale applications

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Analyzing lock contention in multithreaded applications

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
HPCTOOLKIT: tools for performance analysis of optimized parallel programs http://hpctoolkit.org

Concurrency and Computation: Practice & Experience - Scalable Tools for High-End Computing
The Scalasca performance toolset architecture

Concurrency and Computation: Practice & Experience - Scalable Tools for High-End Computing
Clustering performance data efficiently at massive scales

Proceedings of the 24th ACM International Conference on Supercomputing
Performance analysis for parallel programs from multicore to petascale

Performance analysis for parallel programs from multicore to petascale
Detecting application load imbalance on high end massively parallel systems

Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing

Scalable fine-grained call path tracing

Proceedings of the international conference on Supercomputing
Quantifying the effectiveness of load balance algorithms

Proceedings of the 26th ACM international conference on Supercomputing
Understanding the formation of wait states in applications with one-sided communication

Proceedings of the 20th European MPI Users' Group Meeting
A data-centric profiler for parallel programs

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Applications must scale well to make efficient use of today's class of petascale computers, which contain hundreds of thousands of processor cores. Inefficiencies that do not even appear in modest-scale executions can become major bottlenecks in large-scale executions. Because scaling problems are often difficult to diagnose, there is a critical need for scalable tools that guide scientists to the root causes of scaling problems. Load imbalance is one of the most common scaling problems. To provide actionable insight into load imbalance, we present post-mortem parallel analysis techniques for pinpointing and quantifying load imbalance in the context of call path profiles of parallel programs. We show how to identify load imbalance in its static and dynamic context by using only low-overhead asynchronous call path profiling to locate regions of code responsible for communication wait time in SPMD executions. We describe the implementation of these techniques within HPCTOOLKIT.