Waiting time analysis and performance visualization in Carnival
SPDT '96 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
Exploiting hardware performance counters with flow and context sensitive profiling
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Proceedings of the 14th international conference on Supercomputing
MRNet: A Software-Based Multicast/Reduction Network for Scalable Tools
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
MPI performance analysis tools on Blue Gene/L
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Compensation of Measurement Overhead in Parallel Performance Profiling
International Journal of High Performance Computing Applications
Automatic analysis of inefficiency patterns in parallel applications: Research Articles
Concurrency and Computation: Practice & Experience - European–American Working Group on Automatic Performance Analysis (APART)
Scientific Programming - Large-Scale Programming Tools and Environments
Scalable load-balance measurement for SPMD codes
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Effective performance measurement and analysis of multithreaded applications
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Binary analysis for measurement and attribution of program performance
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
ScalaTrace: Scalable compression and replay of communication traces for high-performance computing
Journal of Parallel and Distributed Computing
Diagnosing performance bottlenecks in emerging petascale applications
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Analyzing lock contention in multithreaded applications
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
HPCTOOLKIT: tools for performance analysis of optimized parallel programs http://hpctoolkit.org
Concurrency and Computation: Practice & Experience - Scalable Tools for High-End Computing
The Scalasca performance toolset architecture
Concurrency and Computation: Practice & Experience - Scalable Tools for High-End Computing
Clustering performance data efficiently at massive scales
Proceedings of the 24th ACM International Conference on Supercomputing
Performance analysis for parallel programs from multicore to petascale
Performance analysis for parallel programs from multicore to petascale
Detecting application load imbalance on high end massively parallel systems
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Scalable fine-grained call path tracing
Proceedings of the international conference on Supercomputing
Quantifying the effectiveness of load balance algorithms
Proceedings of the 26th ACM international conference on Supercomputing
Understanding the formation of wait states in applications with one-sided communication
Proceedings of the 20th European MPI Users' Group Meeting
A data-centric profiler for parallel programs
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
Applications must scale well to make efficient use of today's class of petascale computers, which contain hundreds of thousands of processor cores. Inefficiencies that do not even appear in modest-scale executions can become major bottlenecks in large-scale executions. Because scaling problems are often difficult to diagnose, there is a critical need for scalable tools that guide scientists to the root causes of scaling problems. Load imbalance is one of the most common scaling problems. To provide actionable insight into load imbalance, we present post-mortem parallel analysis techniques for pinpointing and quantifying load imbalance in the context of call path profiles of parallel programs. We show how to identify load imbalance in its static and dynamic context by using only low-overhead asynchronous call path profiling to locate regions of code responsible for communication wait time in SPMD executions. We describe the implementation of these techniques within HPCTOOLKIT.