Measuring causal propagation of overhead of inefficiencies in parallel applications

Authors:
Hassan M. Jafri
Affiliations:
University of Illinois at Urbana-Champaign
Venue:
PDCS '07 Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems
Year:
2007

Citing 5
Cited 1

Logical Time in Distributed Computing Systems

Computer - Distributed computing systems: separate resources acting as one
Performance analysis of distributed applications using automatic classification of communication inefficiencies

Proceedings of the 14th international conference on Supercomputing
Modeling and detecting performance problems for distributed and parallel programs with JavaPSL

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
The Paradyn Parallel Performance Measurement Tool

Computer
The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q

Proceedings of the 2003 ACM/IEEE conference on Supercomputing

Extending the scope of the controlled logical clock

Cluster Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Parallel applications are notorious for their intractability to performance debugging. Automatic performance analysis techniques, such as those used by Kojak and KappaPI, are promising in alleviating the difficulty of discovering performance inefficiencies in parallel applications. However, as we show in this paper, the results produced by these tool can be potentially misleading and sometimes, outright incorrect. The reason is that the overhead due to performance inefficiencies originating at a certain point in the program can causally propagate and manifest itself at other points. Current techniques perform a flat analysis, i.e., they do not account for causal propagation. In this paper, we present a method of causal analysis that current analysis techniques can be retrofitted with to account for causal propagation of overhead to arrive at a more accurate description of performance bottlenecks. We also show various advantages rendered by this technique to improving the effectiveness of automatic performance analysis. In this paper, we only tackle overhead related to communication operations in MPI parallel application. In general, however, our technique can be used for non-communication related overhead for any parallel programming paradigm.