Waiting time analysis and performance visualization in Carnival
SPDT '96 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
An online computation of critical path profiling
SPDT '96 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
Using cause-effect analysis to understand the performance of distributed programs
SPDT '98 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
Near-Critical Path Analysis of Program Activity Graphs
MASCOTS '94 Proceedings of the Second International Workshop on Modeling, Analysis, and Simulation On Computer and Telecommunication Systems
The Tau Parallel Performance System
International Journal of High Performance Computing Applications
The implementation of the finite-volume dynamical core in the community atmosphere model
Journal of Computational and Applied Mathematics
The Scalasca performance toolset architecture
Concurrency and Computation: Practice & Experience - Scalable Tools for High-End Computing
Scaling molecular dynamics to 3000 processors with projections: a performance analysis case study
ICCS'03 Proceedings of the 2003 international conference on Computational science
GASP! a standardized performance analysis tool interface for global address space programming models
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
Scalable Identification of Load Imbalance in Parallel Executions Using Call Path Profiles
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
International Journal of High Performance Computing Applications
Identifying the Root Causes of Wait States in Large-Scale Parallel Applications
ICPP '10 Proceedings of the 2010 39th International Conference on Parallel Processing
Scalable detection of MPI-2 remote memory access inefficiency patterns
International Journal of High Performance Computing Applications
Scalable Critical-Path Based Performance Analysis
IPDPS '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium
Efficient MPI implementation of a parallel, stable merge algorithm
EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface
Hi-index | 0.00 |
To better understand the formation of wait states in MPI programs and to support the user in finding optimization targets in the case of load imbalance, a major source of wait states, we added in our earlier work two new trace-analysis techniques to Scalasca, a performance analysis tool designed for large-scale applications. In this paper, we show how the two techniques, which were originally restricted to two-sided and collective MPI communication, are extended to cover also one-sided communication. We demonstrate our experiences with benchmark programs and a mini-application representing the core of the POP ocean model.