Extending a traditional debugger to debug massively parallel applications

Authors:
Susanne M. Balle;Bevin R. Brett;Chih-Ping Chen;David LaFrance-Linden
Affiliations:
Hewlett-Packard, MS ZKO2-3/Q08, 110 Spit Brook Road, Nashua, NH;Hewlett-Packard, MS ZKO2-3/Q08, 110 Spit Brook Road, Nashua, NH and Intel Computer Corporation, 110 Spit Brook Road, Nashua, NH;Hewlett-Packard, MS ZKO2-3/Q08, 110 Spit Brook Road, Nashua, NH and Intel Computer Corporation, 110 Spit Brook Road, Nashua, NH;Hewlett-Packard, MS ZKO2-3/Q08, 110 Spit Brook Road, Nashua, NH
Venue:
Journal of Parallel and Distributed Computing
Year:
2004

Citing 4
Cited 11

Monitoring and debugging of distributed real-time systems

Monitoring and debugging of distributed real-time systems
Guard: a relative debugger

Software—Practice & Experience
Challenges in designing an HPF debugger

Digital Technical Journal
DeBugging and Performance Tuning for Parallel Computing Systems

DeBugging and Performance Tuning for Parallel Computing Systems

A New Approach to Parallel Debugger Architecture

PARA '02 Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing
A debugger for flow graph based parallel applications

Proceedings of the 2007 ACM workshop on Parallel and distributed systems: testing and debugging
DMTracker: finding bugs in large-scale parallel programs by detecting anomaly in data movements

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Lessons learned at 208K: towards debugging millions of cores

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
FlowChecker: Detecting Bugs in MPI Libraries via Message Flow Checking

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
GRace: a low-overhead mechanism for detecting data races in GPU programs

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Ygdrasil: aggregator network toolkit for large scale systems and the grid

PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
Using sequential debugging techniques with massively parallel programs

ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part II
Debugging distributed shared memory applications

ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
Debugging component-based embedded applications

Proceedings of the 15th International Workshop on Software and Compilers for Embedded Systems
Probabilistic diagnosis of performance faults in large-scale parallel applications

Proceedings of the 21st international conference on Parallel architectures and compilation techniques

Quantified Score

Hi-index	0.00

Visualization

Abstract

Beowulf systems, and other proprietary approaches, are placing systems with four or more CPUs in the hands of many researchers and commercial users. In the near future, systems with hundreds of CPUs will become commonly available, with some programmers dealing with tens of thousands of CPUs. The debugging methods used on these systems are a combination of the traditional methods used for debugging single processes and ad-hoc methods to help the user cope with the multitudes of processes. Programmers are usually familiar with a single-process debugger and would like to use it (with minimal user-visible extensions) to debug their distributed program.We present a set of modifications to a traditional debugger that makes it capable of debugging applications running on thousands of processes. Our parallel debugger is composed of individual fully functional debuggers connected with an n-nary aggregating network. This permits us to present to users the results from each debugger at the same time in an aggregated fashion. Users get a global view of the application and can easily see if a given parameter has a different value from either what they expect it to be or from the other processes. Users can then focus on the process sets of interest and investigate the problem.One challenge when debugging thousands of processes is to deal with the amount of output coming from all the debuggers. We present methods to aggregate the overwhelming amount of output from the debuggers into a more manageable subset, which is presented to the user without losing information.Experiments show that the debugger is scalable to thousands of processors. The startup mechanism, as well as users' command response time scale well. The conclusions preseated regarding the architecture and the new parallel debugger's scalability are not specific to the serial debugger we are using in our example implementation.