Identifying the Root Causes of Wait States in Large-Scale Parallel Applications

Authors:
David Bohme;Markus Geimer;Felix Wolf;Lukas Arnold
Affiliations:
-;-;-;-
Venue:
ICPP '10 Proceedings of the 2010 39th International Conference on Parallel Processing
Year:
2010

Citing 0
Cited 4

ADP: automated diagnosis of performance pathologies using hardware events

Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
A scalable infrastructure for the performance analysis of passive target synchronization

Parallel Computing
Understanding the formation of wait states in applications with one-sided communication

Proceedings of the 20th European MPI Users' Group Meeting
Effective sampling-driven performance tools for GPU-accelerated supercomputers

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Driven by growing application requirements and accelerated by current trends in microprocessor design, the number of processor cores on modern supercomputers is increasing from generation to generation. However, load or communication imbalance prevents many codes from taking advantage of the available parallelism, as delays of single processes may spread wait states across the entire machine. Moreover, when employing complex point-to-point communication patterns, wait states may propagate along far-reaching cause-effect chains that are hard to track manually and that complicate an assessment of the actual costs of an imbalance. Building on earlier work by Meira Jr. et al., we present a scalable approach that identifies program wait states and attributes their costs in terms of resource waste to their original cause. By replaying event traces in parallel both in forward and backward direction, we can identify the processes and call paths responsible for the most severe imbalances even for runs with tens of thousands of processes.