Software—Practice & Experience
Challenges in designing an HPF debugger
Digital Technical Journal
Extending a traditional debugger to debug massively parallel applications
Journal of Parallel and Distributed Computing
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
MRNet: A Software-Based Multicast/Reduction Network for Scalable Tools
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Tree-based overlay networks for scalable applications
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Order preserving event aggregation in TBONs
EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
A Scalable Parallel Debugging Library with Pluggable Communication Protocols
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Hi-index | 0.01 |
Modern computer systems are placing systems with hundreds even tens of thousands of CPUs in the hands of many researchers and commercial users. The debugging methods used on these systems are a combination of traditional and ad-hoc methods. Programmers are usually familiar with a serial debugger, and would like to use it to debug their distributed program. We present a set of modifications to a traditional debugger that makes it capable of debugging massively parallel applications. Our parallel debugger is composed of individual fully functional debuggers connected with an n-nary aggregating network to condense debugger outputs. This allows us to present the user with a global view of the application. The user can thereby easier see if a given parameter has a different value than expected and then focus on the problem. Experiments show that the debugger, both startup and users' command response time, is scalable to thousands of processors.