IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
An overview of the BlueGene/L Supercomputer
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
MRNet: A Software-Based Multicast/Reduction Network for Scalable Tools
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
An API for Runtime Code Patching
International Journal of High Performance Computing Applications
BlueGene/L applications: Parallelism On a Massive Scale
International Journal of High Performance Computing Applications
Lessons learned at 208K: towards debugging millions of cores
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Modular, Fine-Grained Adaptation of Parallel Programs
ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Bypassing races in live applications with execution filters
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Hi-index | 0.00 |
Dynamic binary instrumentation for performance analysis on new, large scale architectures such as the IBM Blue Gene/L system (BG/L) poses new challenges. Their scale---with potentially hundreds of thousands of compute nodes---requires new, more scalable mechanisms to deploy and to organize binary instrumentation and to collect the resulting data gathered by the inserted probes. Further, many of these new machines don't support full operating systems on the compute nodes; rather, they rely on light-weight custom compute kernels that do not support daemon-based implementations.We describe the design and current status of a new implementation of the DPCL (Dynamic Probe Class Library) API for BG/L. DPCL provides an easy to use layer for dynamic instrumentation on parallel MPI applications based on the DynInst dynamic instrumentation mechanism for sequential platforms. Our work includes modifying DynInst to control instrumentation from remote I/O nodes and porting DPCL's communication to use MRNet, a scalable data reduction network for collecting performance data. We describe extensions to the DPCL API that support instrumentation of task subsets and aggregation of collected performance data. Overall, our implementation provides a scalable infrastructure that provides efficient binary instrumentation on BG/L.