Lessons learned at 208K: towards debugging millions of cores

Authors:
Gregory L. Lee;Dong H. Ahn;Dorian C. Arnold;Bronis R. de Supinski;Matthew Legendre;Barton P. Miller;Martin Schulz;Ben Liblit
Affiliations:
Lawrence Livermore National Laboratory, Livermore, CA;Lawrence Livermore National Laboratory, Livermore, CA;University of Wisconsin, Madison, WI;Lawrence Livermore National Laboratory, Livermore, CA;University of Wisconsin, Madison, WI;University of Wisconsin, Madison, WI;Lawrence Livermore National Laboratory, Livermore, CA;University of Wisconsin, Madison, WI
Venue:
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Year:
2008

Citing 12
Cited 12

The Dynamic Probe Class Library: An Infrastucture for Developing Instrumentation for Performance Tools

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Supermon: A High-Speed Cluster Monitoring System

CLUSTER '02 Proceedings of the IEEE International Conference on Cluster Computing
Extending a traditional debugger to debug massively parallel applications

Journal of Parallel and Distributed Computing
MRNet: A Software-Based Multicast/Reduction Network for Scalable Tools

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
An API for Runtime Code Patching

International Journal of High Performance Computing Applications
Scalable dynamic binary instrumentation for Blue Gene/L

ACM SIGARCH Computer Architecture News - Special issue on the 2005 workshop on binary instrumentation and application
MPI performance analysis tools on Blue Gene/L

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Open | SpeedShop: An open source infrastructure for parallel performance analysis

Scientific Programming - Large-Scale Programming Tools and Environments
Overcoming Scalability Challenges for Tool Daemon Launching

ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
Benefits of high speed interconnects to cluster file systems: a case study with lustre

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Scalable parallel trace-based performance analysis

EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
TA UoverSupermon: low-overhead online parallel performance monitoring

Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing

Scalable temporal order analysis for large scale debugging

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Diagnosing performance bottlenecks in emerging petascale applications

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Monitoring MPI programs for performance characterization and management control

Proceedings of the 2010 ACM Symposium on Applied Computing
FINAL: flexible and scalable composition of file system name spaces

Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers
Vrisha: using scaling properties of parallel programs for bug detection and localization

Proceedings of the 20th international symposium on High performance distributed computing
Large scale debugging of parallel tasks with AutomaDeD

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
A lightweight library for building scalable tools

PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2
A Scalable Parallel Debugging Library with Pluggable Communication Protocols

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
ABHRANTA: locating bugs that manifest at large system scales

HotDep'12 Proceedings of the Eighth USENIX conference on Hot Topics in System Dependability
LIBI: A framework for bootstrapping extreme scale software systems

Parallel Computing
WuKong: automatically detecting and localizing bugs that manifest at large system scales

Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Overcoming extreme-scale reproducibility challenges through a unified, targeted, and multilevel toolset

SE-HPCCSE '13 Proceedings of the 1st International Workshop on Software Engineering for High Performance Computing in Computational Science and Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Petascale systems will present several new challenges to performance and correctness tools. Such machines may contain millions of cores, requiring that tools use scalable data structures and analysis algorithms to collect and to process application data. In addition, at such scales, each tool itself will become a large parallel application - already, debugging the full Blue-Gene/L (BG/L) installation at the Lawrence Livermore National Laboratory requires employing 1664 tool daemons. To reach such sizes and beyond, tools must use a scalable communication infrastructure and manage their own tool processes efficiently. Some system resources, such as the file system, may also become tool bottlenecks. In this paper, we present challenges to petascale tool development, using the Stack Trace Analysis Tool (STAT) as a case study. STAT is a lightweight tool that gathers and merges stack traces from a parallel application to identify process equivalence classes. We use results gathered at thousands of tasks on an Infiniband cluster and results up to 208K processes on BG/L to identify current scalability issues as well as challenges that will be faced at the petascale. We then present implemented solutions to these challenges and show the resulting performance improvements. We also discuss future plans to meet the debugging demands of petascale machines.