Debugging Parallel Programs with Instant Replay
IEEE Transactions on Computers
A New Approach to Parallel Debugger Architecture
PARA '02 Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing
Implementation Techniques for a Parallel Relative Debugger
PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
MRNet: A Software-Based Multicast/Reduction Network for Scalable Tools
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Scalable statistical bug isolation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Encyclopedia of Algorithms
Lessons learned at 208K: towards debugging millions of cores
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Overcoming Scalability Challenges for Tool Daemon Launching
ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
A Scalable Tools Communications Infrastructure
HPCS '08 Proceedings of the 2008 22nd International Symposium on High Performance Computing Systems and Applications
Data centric highly parallel debugging
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Assertion Based Parallel Debugging
CCGRID '11 Proceedings of the 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing
Large scale debugging of parallel tasks with AutomaDeD
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Practical model-checking method for verifying correctness of MPI programs
PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Hi-index | 0.00 |
Parallel debugging faces challenges in both scalability and efficiency. A number of advanced methods have been invented to improve the efficiency of parallel debugging. As the scale of system increases, these methods highly rely on a scalable communication protocol in order to be utilized in large-scale distributed environments. This paper describes a debugging middleware that provides fundamental debugging functions supporting multiple communication protocols. Its pluggable architecture allows users to select proper communication protocols as plug-ins for debugging on different platforms. It aims to be utilized by various advanced debugging technologies across different computing platforms. The performance of this debugging middleware is examined on a Cray XE Supercomputer with 21,760 CPU cores.