Logical Time in Distributed Computing Systems
Computer - Distributed computing systems: separate resources acting as one
Proceedings of the 14th international conference on Supercomputing
Modeling and detecting performance problems for distributed and parallel programs with JavaPSL
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Extending the scope of the controlled logical clock
Cluster Computing
Hi-index | 0.00 |
Parallel applications are notorious for their intractability to performance debugging. Automatic performance analysis techniques, such as those used by Kojak and KappaPI, are promising in alleviating the difficulty of discovering performance inefficiencies in parallel applications. However, as we show in this paper, the results produced by these tool can be potentially misleading and sometimes, outright incorrect. The reason is that the overhead due to performance inefficiencies originating at a certain point in the program can causally propagate and manifest itself at other points. Current techniques perform a flat analysis, i.e., they do not account for causal propagation. In this paper, we present a method of causal analysis that current analysis techniques can be retrofitted with to account for causal propagation of overhead to arrive at a more accurate description of performance bottlenecks. We also show various advantages rendered by this technique to improving the effectiveness of automatic performance analysis. In this paper, we only tackle overhead related to communication operations in MPI parallel application. In general, however, our technique can be used for non-communication related overhead for any parallel programming paradigm.