IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Supermon: A High-Speed Cluster Monitoring System
CLUSTER '02 Proceedings of the IEEE International Conference on Cluster Computing
Extending a traditional debugger to debug massively parallel applications
Journal of Parallel and Distributed Computing
MRNet: A Software-Based Multicast/Reduction Network for Scalable Tools
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
An API for Runtime Code Patching
International Journal of High Performance Computing Applications
Scalable dynamic binary instrumentation for Blue Gene/L
ACM SIGARCH Computer Architecture News - Special issue on the 2005 workshop on binary instrumentation and application
MPI performance analysis tools on Blue Gene/L
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Open | SpeedShop: An open source infrastructure for parallel performance analysis
Scientific Programming - Large-Scale Programming Tools and Environments
Overcoming Scalability Challenges for Tool Daemon Launching
ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
Benefits of high speed interconnects to cluster file systems: a case study with lustre
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Scalable parallel trace-based performance analysis
EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
TA UoverSupermon: low-overhead online parallel performance monitoring
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Scalable temporal order analysis for large scale debugging
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Diagnosing performance bottlenecks in emerging petascale applications
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Monitoring MPI programs for performance characterization and management control
Proceedings of the 2010 ACM Symposium on Applied Computing
FINAL: flexible and scalable composition of file system name spaces
Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers
Vrisha: using scaling properties of parallel programs for bug detection and localization
Proceedings of the 20th international symposium on High performance distributed computing
Large scale debugging of parallel tasks with AutomaDeD
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
A lightweight library for building scalable tools
PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2
A Scalable Parallel Debugging Library with Pluggable Communication Protocols
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
ABHRANTA: locating bugs that manifest at large system scales
HotDep'12 Proceedings of the Eighth USENIX conference on Hot Topics in System Dependability
LIBI: A framework for bootstrapping extreme scale software systems
Parallel Computing
WuKong: automatically detecting and localizing bugs that manifest at large system scales
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
SE-HPCCSE '13 Proceedings of the 1st International Workshop on Software Engineering for High Performance Computing in Computational Science and Engineering
Hi-index | 0.00 |
Petascale systems will present several new challenges to performance and correctness tools. Such machines may contain millions of cores, requiring that tools use scalable data structures and analysis algorithms to collect and to process application data. In addition, at such scales, each tool itself will become a large parallel application - already, debugging the full Blue-Gene/L (BG/L) installation at the Lawrence Livermore National Laboratory requires employing 1664 tool daemons. To reach such sizes and beyond, tools must use a scalable communication infrastructure and manage their own tool processes efficiently. Some system resources, such as the file system, may also become tool bottlenecks. In this paper, we present challenges to petascale tool development, using the Stack Trace Analysis Tool (STAT) as a case study. STAT is a lightweight tool that gathers and merges stack traces from a parallel application to identify process equivalence classes. We use results gathered at thousands of tasks on an Infiniband cluster and results up to 208K processes on BG/L to identify current scalability issues as well as challenges that will be faced at the petascale. We then present implemented solutions to these challenges and show the resulting performance improvements. We also discuss future plans to meet the debugging demands of petascale machines.