NAS Parallel Benchmark Results
IEEE Parallel & Distributed Technology: Systems & Technology
Automatic performance analysis of hybrid MPI/OpenMP applications
Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Evolutions in parallel distributed and network-based processing
Performance Tool Support for MPI-2 on Linux
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
A Scalable Implementation of a Finite-Volume Dynamical Core in the Community Atmosphere Model
International Journal of High Performance Computing Applications
The Tau Parallel Performance System
International Journal of High Performance Computing Applications
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Replay-Based Synchronization of Timestamps in Event Traces of Massively Parallel Applications
ICPPW '08 Proceedings of the 2008 International Conference on Parallel Processing - Workshops
Scalable parallel trace-based performance analysis
EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
Specification of inefficiency patterns for MPI-2 one-sided communication
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Hi-index | 0.00 |
Wait states in parallel applications can be identified by scanning event traces for characteristic patterns. In our earlier work, we have defined such patterns for mpi -2 one-sided communication, although still based on a trace-analysis scheme with limited scalability. Taking advantage of a new scalable trace-analysis approach based on a parallel replay, which was originally developed for mpi -1 point-to-point and collective communication, we show how wait states in one-sided communications can be detected in a more scalable fashion. We demonstrate the scalability of our method and its usefulness for the optimization cycle with applications running on up to 8,192 cores.